THE JOURNAL OF 
EDUCATIONAL PSYCHOLOGY 


Volume XXIV February, 1933 Number 2 














THE TIME INTERVAL BETWEEN TEST AND RE-TEST 
IN ITS RELATION TO THE CONSTANCY OF THE 
INTELLIGENCE QUOTIENT* 


RALPH R. BROWN 


STATEMENT OF THE PROBLEM 


The present paper has a twofold purpose: First, to determine the 
effect of the time interval between test and retest on the constancy 
of the intelligence quotient; and secondly, to supplement our knowl- 
edge of mental growth through the longitudinal method of develop- 
mental study. For more than a decade children have been tested, 
predictions have been made, and labels have been attached by way of 
the IQ. How valid were these predictions? 

Although there have been a large number of studies concerned 
with the constancy of the IQ, the interval between tests has been 
usually less than three years, as may be seen in Table I. 

This table presents the results found in most of the previous 
studies on the constancy of the intelligence quotient. According to 
T. G. Foran’* who has been reviewing the literature on this subject 
for the past five years, Baldwin and Stecher’s contribution contains 
the best available data concerning the effect of the long-time interval 
on the constancy of the 1Q.1_ These workers, however, did not have a 
sufficient number of cases upon which to base reliable conclusions, and 
their data are not presented in a form comparable to those of other 
studies. Also, the same individuals were not measured at each succes- 
sive time interval. Slocombe*® was recently criticized by Foran 





* This paper is an abstract of a thesis done at the Institute for Juvenile Research 
in partial fulfillment for the degree of Master of Arts at the University of Illinois. 
Studies from the Institute for Juvenile Research, Chicago. Series C, No. 220. 


81 


' 
: 
; 
: 
' 
: 


oe 


AE Da 











82 The Journal of Educational Psychology 


because of his misinterpretation of the data. Baldwin and Stecher’s 
data are derived from studies on normal and superior children. 

The data in this study were derived from the case reports of children 
presenting some sort of behavior difficulty which brought them to the 
Institute for Juvenile Research for an examination. The mean 
IQ of this group is 81.9. An interval of time, per se, could not, of 


TABLE I.*—CoRRELATIONS PRESENTED IN FORMER STUDIES BETWEEN TESTS AND 
RE-TESTS ON THE STANFORD-BINET 








Author N r 
ance Wie has ae aaa ws 6 eae 114 | .948 
ie elk has dn Snide eminent 274 | .72 
Rugg and Colloton................ 137 | .84 
I ci splat eee dnhaeecehnwaen 435 | .93 
Cs a ee ce dubh sean a bwkis 44 | .84 
Ee ree Sie Bee 
is lai an eo Ga ean Wh ee sen Ve 
Cuneo and Terman................ 31 | .85 

21 | .94 
25 | .95 


298 | .88 | One-year interval 
127 | .91 | Two-year interval 
42 | .83 | Three-year interval 
Baldwin and Stecher.............. 40 | .85 | Two-year interval 
40 | .73 | Three-year interval 
40 | .77 | Four-year interval 
40 | .81 | Five-year interval 
40 | .81 | Six-year interval 
CS cnictiG Gi eau aa ekak ca ....| .88 | One-year interval 
.91 | Two-year interval 
... | .83 | Four-year interval 
Se, I TW oo cc icccccssene 221 | .91 | Less than one-year interval 
320 | .87 | One-year interval 
99 | .88 | Two-year interval 
41 | .87 | Three-year interval 
Graz and Harsdon................ 109 | .851 | One- to five-year interval 
RES a a eee rey 441 | .814 














* A large part of this table is taken directly from a table presented by L. 8. 
Rugg in his article published 1925 in the Journal of Educational Psychology.*® 


course, affect the IQ rating. The change results from the interplay 
of inherited and environmental factors operating within and without 
the individual. The meaning of the phrase, ‘length of time interval’’ 











tion 
left 1 


“c 


pre 
Inst 
this 


first 
at 7 


ler 
for 
sig 
pr 
th 


dc 
ar 














Time Interval between Test and Re-test 83 


in this paper, therefore, is “‘the extent to which the organism was 
exposed to the effects of heredity and environment.’’ And the ques- 
tion of nature versus nurture in regard to the change in IQ must be 
left unanswered at the present time. 


DESCRIPTION OF THE DATA 


The Stanford-Binet test results of one hundred twenty-four 
“problem” children who were examined two or more times at the 
Institute for Juvenile Research, Chicago, furnish the data used for 
this study. The number of tests given to each child ranged from two 
to nine, with an average of 3.34. The time intervals between the 
first and the last test ranged from five to twelve years, with the mean 
at 7.67 years. Table II presents the data in more detail. 


TABLE IJ].—DESCRIPTION OF THE DaTA 








N = 124 
First test, years Last test, years 
CA 
Dt. bce tues shin Koha ead wae NS 8.19+ .17 15.86 + .21 
A bes bedicediuecda kkaaadeeeu ie 2.78 + .12 3.48 + .15 
Ch dts Kaneda Edie 44 HAS ROS Oe TN 3.16 to 16.42 10.25 to 28.42 
MA 
ttc bee SaR kek ek se tadenaeahs 6.52 + .13 11.38 + .17 
RE ob Ot a 217+ .09 2.87+ .12 
RIESE SO a ee eee 3.0 to 12.83 3.3 to 18.59 
IQ 
ees Seee Ota a oh ne ee eam 81.94 + 1.1 78.02 + 1.25 
ER ee nee ee 18.34 + .79 20.62 + .88 
eG lk is eae dh ak eae .46to 1.31 13 to 1.30 











In a study of the constancy of the intelligence quotient in ‘‘ prob- 
lem” children over a short-time interval, Andrew W. Brown® 
found the average IQ on 707 first examinations to be 78.74, with a 
sigma of 15.7. The slightly higher average IQ of 81.9 found in the 
present study does not appear to be of sufficient significance to render 
the data atypical from the standpoint of intelligence. 

The reasons for referring the children in this study to the clinic 
do not seem to differ in any respect from the causes which bring the 
average run of children into the Institute. They embrace such prob- 





84 The Journal of Educational Psychology 


lems as the following: Truancy, enuresis, ‘‘hard to manage,” ‘‘ nervous- 
ness,”’ ‘‘lying,’’ ‘‘stealing,”’ ‘‘subnormality,” ‘‘masturbation,” “‘ advice 
on adoption,”’ etc. 

Upon retest, it was found that eight of the one hundred twenty-four 
individuals had been in institutions for the feeble-minded for more than 


a year. Psychotic and epileptic individuals were not included in 
the data. 


TasLe IIJ.—Megans snp Deviations or CA, MA, anp IQ, at Twetve-MontTs 











INTERVALS 
CA, years MA, years IQ 

Intervals 

Pay N 

in, months 

M PEs SD M | PEs SD M PEm SD 

ee 36 

Ist test...... ..| 8.94) .359 3.19 6.61) .234 2.08 |75.58| 1.32 11.71 

2nd test..... ..| 9.58) .351 3.12 7.25| .235 2.09 (77.08) 1.53 13.64 
13-24.........} 45 

Ist test...... .-| 9.12} .307 3.05 7.12} .214 2.13 |79.73) 1.39 13.8 

2nd test..... ../10.16] .29 2.95 8.21 .192 1.91 {79.00} 1.39 13.8 
25-36.........| 48 

Ist test...... .-| 9.31 .318 3.26 7.33) .248 2.55 |78.94) 1.26 12.94 

2nd test..... ..{11.98) .307 3.15 9.19) .251 2.58 |79.63} 1.29 13.28 
37-48.........| 36 

lst test...... ..| 8.81) .345 3.07 7.11} .278 2.48 /|81.41| 1.35 12.00 

2nd test..... ..|12.42| .337 2.99 9.64) .231 2.06 (80.42) 1.37 12.22 
bes+ece«s 43 

lst test...... ..| 9.08) .296 2.88 7.17; .238 2.31 |79.50| 1.65 16.05 

2nd test..... ..}13.83) .298 2.90 /|10.31 . 240 2.34 {77.83} 1.55 15.10 
ee 42 

Ist test...... ..| 8.95) .287 2.75 7.00} .223 2.14 |79.43) 1.38 13.26 

2nd test..... ..{14.52| .294 2.82 (10.48) .250 2.41 '75.71| 1.65 15.83 
a 63 

Ist test...... ..) 8.18] .182 2.14 6.56) .154 1.81 {80.05} 1.28 15.10 

2nd test..... ..|14.74, .191 2.25 {10.83} .218 2.57 |76.27| 1.78 20.97 
85-96.........| 34 

Ist test...... | €.60| .278 2.40 5.94; .223 1.93 (81.00) 1.98 17.01 

2nd test.....| ..|15.03) .297 2.57 |10.88) .248 2.14 |75.91| 1.85 15.95 
ee 32 

Ist test...... ..| 8.53] .397 3.33 6.97| .350 2.94 |84.50) 2.36 19.80 

2nd test..... ..|16.84) .395 3.31 |12.38) .345 2.89 /|81.09) 2.14 17.98 
*109-145......| 22 

Ist test...... ..| 9.33) P < .01) S = 3.8) 6.91) P < .01) S = 2.5)/78.18) P < .01| S = 21.2 

2nd test..... 19.8 | P < .01| S = 4.0)11.58) P < .01| S = 3.5|73.23)P < .01| S = 22.5 



































* Data here figured by Fisher’s method for small samples. 


Inasmuch as many individuals had more than two tests, it was 
possible to analyze the data by increasing time intervals. Table III 
presents the means and sigmas for these data. The reader should 


Tee ea a 





Be ie eI NE NLS I He Ss 


FOLH LP Re EIU RE, 


ailagte 








TS 
= 











the 
usec 


TaB 


Mi 


|ee°°8 


tio 
TI 
Ta 








LR ROR CII I oH 





Pe FEN gan Re NAAT MERE RIE 2 ow 








! 
| 


Time Interval between Test and Re-test 


bear in mind that when the data are presented in this manner, the 


same individuals are not all used in each interval. 


ANALYSIS AND RESULTS 


There is a positive correlation of .79 + .02 between the IQ’s on 
the first and last examinations of the one hundred twenty-four cases 
used in this study, in which the average time interval between test 


Taste IV.—CorRELATIONS, MEANS, STANDARD DEVIATIONS, 


85 


BETWEEN THE First AND SECOND TESTS AFTER A 1-TO-24 Montuss’ 
INTERVAL AS COMPARED WITH THOSE OF THE SAME INDIVIDUALS 


AFTER AN INTERVAL OF 60 To 145 MontTus 


AND RANGES 














1-24 months 60-145 months 
OI cis adele dee cin ceee reise .86 + .02 61+ .06 
ere rece aes ee aS ea 78.53 + 1.14 79.17 + 1.24 
Oe ae NE a gang ee ek ee 79.62 + 1.22 75.48 + 1.37 
TN ESS ETS aks + Te REY ER re 12.88 + .8l 14.05 + .88 
(RD AEE A 7 RI UE MS parte 13.88 + .87 15.48 + .97 
CRN oak weaned wh bees ieee reaes 49 to 110 IQ 49 to 114 1Q 
Ne cdi Sly oa 6 atte ica ele Bike 43 to 108 IQ 43 to 111 IQ 





and retest is 7.67 years. This is significantly lower than the correla- 
tions found in previous studies where shorter time intervals are used. 
The best comparable study is that of Andrew W. Brown,’ in 


TaBLE V.—CoRRELATIONS (AND STANDARD DEVIATIONS) BETWEEN THE IQ ON 
THE First AND SEconD EXAMINATION WITH INCREASING TIME INTERVALS 








Intervals in N . 

months 

1-12 36 .86 + .029 8.47 13.6 
13-24 45 .85 + .028 13.8 13.8 
25-36 48 .80 + .035 12.9 13.3 
37-48 36 .71 + .057 12.0 12.2 
49-80 43 .79 + .038 16.0 15.1 
61-72 42 .76 + .044 13.3 15.8 
73-84 63 .77 + .034 15.1 21.0 
85-96 34 .79 + .043 17.1 15.9 
98-108 32 .78 + .047 19.8 18.0 
100-145* 22 .90 P < .0l S,; = 21.2 S; = 22.5 

















* Data here handled by Fisher’s small sample method.® 





86 The Journal of Educational Psychology 


which a correlation of .88 is given between tests on problem children 
when the average time interval is fifteen months. 

Fifty-eight of the one hundred twenty-four cases were also tested 
within a two-year interval. By comparing the correlation between 
tests in this short interval with the correlation found in the same 
individuals after a seven-year interval, the effect of the time period 
between examinations on the constancy of the IQ is seen to be decid- 
edly significant. Table IV presents these correlations. Over a 

period of from one to twenty-four months, the correspondence gives 
a correlation of positive .86, but after a seven-year interval, we find a 
.61 relationship. 

In Table V are presented the correlations found between the IQ 
on the first and second examinations with each yearly increase in the 
period of time between tests. The Pearson Product-moment method 
of correlation has been used for all but the last (109-145) interval. 
Since there were only twenty-two cases here, Fisher’s method for small 
samples* was used. 

With the exception of the last interval, the correlations between 
test and retest for those examined within a two-year interval are 
significantly higher than for those who were examined after a three- 
year time period. Because of the few cases in the last time interval, 
it would be unsafe to interpret the .90 correlation. Part of the expla- 
nation for this apparent discrepancy, however, lies, no doubt, in the 
fact that there was a definite tendency for the individuals to lose 
during this long-time period. There were only 33 IQ points gained 
during this interval as against 141 IQ points lost. No individual 
gained more than six points, but eight out of the twenty-two lost more 
than six points. (These data are presented later, in Table IX.) 
Although actually there was a tendency toward point loss, the relative 
standing of the individuals did not vary greatly; the correlations there- 
fore are high. 

It is perhaps well to emphasize that a high correlation between the 
IQ on test and retest does not imply constancy of the IQ unless the 
mean IQ remains unchanged. We are not concerned here with rela- 
tive changes, but rather, actual IQ point changes as they affect the 
individual ratings. In 1921 Doll made a study of the constancy of 
the IQ. He stated, ‘Only one subject out of a total of one hundred 
six feeble-minded subjects who were below sixteen years at the first 
examination maintains an IQ which is in accord with the theory that 
the IQ is constant.’”’ Terman?? then took Doll’s data and computed 











a cc 
insti 
ence 
pra 
the 

vidi 


lati 
ran 
test 
diff 
eig! 
san 


al 


re 
tr 











Time Interval between Test and Re-test 87 


a correlation which turned out to be .96. It is quite probable that 
instead of disproving Doll’s conclusion, this high degree of correspond- 
ence merely indicated that the large changes in IQ points affected 
practically all individuals in about the same way. Probably most of 
the individuals lost, but so long as the relative standing of the indi- 
viduals was not seriously affected, there would be a high correlation. 

Again, the range is a potent factor in raising or lowering the corre- 
lations. For instance, in Brown’s data* for seven hundred seven cases 
ranging from twenty to one hundred thirty IQ, the correlation between 
test and retest is .88. When, however, the range is restricted to three 
different intelligence levels, the correlations drop to eighty-one, sixty- 
eight, and sixty-one. The same individuals, the same tests, and the 
same IQ’s are used. 

Since, in dealing with the constancy of the intelligence quotient, 
we are interested in actual, not relative, changes, it is unwise to place 
much confidence in the correlation technique on such data unless it is 
supplemented by other measures. 

When the data are presented for the number of individuals who 
lost, gained, or remained constant in their intelligence rating over this 
period, the effect of the time interval on the constancy of the 1Q is 
seen to be decidedly significant. 


TasLeE VI.—DrRecTION AND AMOUNT OF CHANGE IN IQ AFTER AN INTERVAL OF 
FROM Sixty TO ONE HuNnpDRED Forty-FivE MontTss 











N = 124 
Number Per cent 
Individuals who gained....................... 19 15.3 
Ee ee 53 42.7 
Individuals remaining constant*............... 52 42 
ee NE Ds a ale dub ain 412 31.8 
EERE SA es le aE Pe 882 68.2 
OGRE GVOTRED CRARED. «ow. cc cc ccc ccc ened 10.4 








* Change of six point or less in either direction is considered constant. 


As shown in Table VI, the average change in IQ points is 10.4— 
almost twice as great as the change over a short period. Further- 
more, there is a significant tendency for individuals to lose in their 
rating rather than to gain. The number of points lost is more than 
twice the number gained. Fifty-three individuals lost more than six 











88 The Journal of Educational Psychology 


points as compared with nineteen who increased their intelligence 
quotients above what is considered constant. 

The amount and direction of change in relation to the length of the 
time interval between the first and second test is shown in Table VII. 
Here a constant increase in change upon retest is seen with each twelve- 
month increment in the time interval up to seven years, after which 
there is a steady decline in change. The relatively few cases in the 
three latter intervals deters one from drawing conclusions from this 
effect. This table brings out more clearly than Table VI the tendency 
for individuals to lose after a long-time interval. Those individuals 


TABLE VII.—AMOUNT AND “DIRECTION OF CHANGE WITH INCREASING TIME 




















INTERVALS 
Months 

1- 13-— | 25—- | 37- | 49— | 61-— | 73— | 85— | 97- | 109- 

12, 24, 36, 48, 60, 72, 84, 96, | 108, | 145, 
N=|N2=|N2=(|N2=|N=|N eIN FN =| N =|N @ 

36 45 48 36 43 42 63 34 32 22 
Individuals gained, number....... 9 al “l 9 at 6 12 5 4 0 
ee a a kcckive care buhieate ew’ 25 13 21 25) 26 14 19 15 13 0 

| 
Individuals constant, number..... 23 30 27 12 16 18 21 13 17 14 
aie Genre deals Ok kaw ae%ee 64 67 56 33 37 43 33 38 53 64 
Individuals lost, number.......... 4 9} 11; 15) 16] 18} 30; 16} 11 8 
I sg sb eia io wae weak me > 4 8'6 a 11 20 23 42 37 43 48 47 34 36 
Points gained, number........... 123} 108) 166) 122) 147} 104) 237 82} 103 33 
ES BE8 6 kia ken ¥ ea wane 64 42 55, 42 39 28 32 24 34 19 
| 

Points lost, number.............. 70| 152) 135 170) 229; 262) 511) 262} 196) 141 
tiie so2 shdeenececswons 36, 58; 45) 58) 61 72; +68 76; 66) 81 
Total average change.......... 5.36; 5.78) 6.27) 8.11) 8.74) 8.70)11.87/10.11! 9.34) 7.90 



































who were retested after a period of less than three years gained in 
their 1Q rating, but after a five-year period, there is a decided tendency 
toward losses. In analysing this table, the reader should hold in mind 
the fact that the same individuals are not all used in each time interval. 

Fifty-eight of the individuals whose test results are presented in 
Table VIII also had examinations at an interval of two years or less. 
As a check on the data presented in the two foregoing tables, the 
amount of change after a short interval is contrasted with the change 
in the same persons after a long-time period. 


Saget LPIA oh 





en 


. 


lor 


we 
thi 


reg ite | 


is 


] 





Peps 





om 


erin 


| 


Time Interval between Test and Re-test 89 


Again the variability is seen to be almost twice as great in the 
longer period. The change during the two-year period conforms very 
well to the accepted expectancy of IQ change upon retest. During 
the short interval sixty-four per cent of the individuals changed their 
ratings only six points or less, whereas after a five-year period the 
percentage of constancy drops to thirty-six, and the mean variability 


Taste VIII.—Tue CHanacE In IQ ArTER AN INTERVAL OF ONE TO TWENTY-FOUR 
Montus aS COMPARED WITH CHANGE IN SAME INDIVIDUALS AFTER AN 
INTERVAL OF Sixty TO OnE HunNpDRED Forty-FivE Monts 

















N = 58 
1-24 60-145 
Number |Per cent | Number | Per cent 

Individuals gained...................... 14 24 10 17 
ik od Cod nek odin ha 7 12 27 47 
Individuals constant.................... 37 64 21 36 
EG  cch bho dees osesseeveseéas 192 57 201 32 
a ee oe 142 43 421 68 

Total average change................. 5.76 10.72 











is 10.72, almost double the change noted in the shorter interval. This 
table also shows the same tendency toward a loss in IQ points after a 
long-time period. The number of individuals who gained is twice as 
great as those who lost in the short interval, but after the passage of 
a few years, this condition is reversed. 


TasLE [X.—DrrReEcTION OF CHANGE BETWEEN IQ’s or Tuose Past THEIR 
SIXTEENTH BirTHDAY AND THOSE BELOW ON FrinAL EXAMINATION 





Under 16 years, | 16 years or older, 














N = 74 N = 50 
Number | Per cent | Number| Per cent 
Individuals gained...................... 16 22 3 6 
Do 32 43 21 42 
Individuals constant.................... 26 35 26 52 
is 6 ce ndece dancer knace 310 36 102 23 
REE ge 547 54 335 77 








Total average change................. 11.58 8.74 








H 
H 
Ve 











90 The Journal of Educational Psychology 


The consistent tendency toward a loss in IQ points after a long 
interval of time seems to indicate either a defect in the Stanford-Binet 
examination at the upper levels, or that sixteen is not the proper 
average limit of mental development—or both. Table IX throws 
some light on this question by showing the direction and amount of 
change for those individuals whose last tests came before they arrived 
at sixteen years of age and those whose tests came at sixteen or later. 

As may be noted in the above table, there is a much more significant 
tendency for those who were tested on or after sixteen to lose or remain 
constant than for those who tested earlier to do so. Only six per cent 
of the older individuals gained as contrasted with a twenty-two per 
cent gain in the younger. The variability of the older group, however, 
is much less. Both groups, however, show a decided tendency toward 
loss. 

In an article published in 1921, Terman,” in reference to the use 
of sixteen years as the average upper limit of mental development 
stated, ‘‘My own sixteen-year estimate may be too high. As it was 
frankly tentative, I do not feel called upon to defend it. Fifteen years 
may be nearer the truth. Fourteen may be, but I doubt it. Any- 
body’s estimate at present is of course only guesswork.” Since that 
time Chipman‘ has shown that in the feeble-minded, at least, sixteen 
years is a much more practicable estimate than is fourteen. Herdata 


TaBLE X.—THE DrReEcTION oF CHANGE IN IQ RESULTING FROM THE USE OF 
FouRTEEN, FIFTEEN, AND SIXTEEN YEARS AS THE Upper AGE LIMIT 


























N = 50 
14 15 16 

No Per No. Per No. Per 

cent cent cent 
Individuals gained................. 23 45 15 30 3 6 
ec ceeeeeceee ben 4 8 12 24 21 42 
Individuals constant............... 23 46 23 46 26 52 
sect adeeav we eee sees 391 80 | 210 41 102 23 
i na uk a's a a a 6 eo 99 20 199 49 355 77 

Total average change............ 9.80 8.18 8.75 








indicate that there is an undue number of gains in IQ upon retest 
when fourteen is used. When sixteen is used, the number of individ- 
uals who lose is more than double the number who gain. When 












p 
1 
u 
8 
V 








Time Interval between Test and Re-test 91 


fourteen is used, the number of gains is over ten times as great as the 
number of losses. Table X presents the direction of change in IQ 
resulting from the use of fourteen, fifteen, and sixteen years as the 
average upper age limit on those individuals in the present data whose 
first rating came before fourteen and whose second rating came on or 
after sixteen years. 

The results presented in the above table agree with those of Chip- 
man in showing an excessive number of gains when fourteen is used 
and a decided tendency toward loss when sixteen is used. The num- 
ber of gains and losses, both for individuals and for point-change is 
more equally divided when fifteen is used. And the average point- 
change is less. Although there are three more individuals remaining 
constant when sixteen is used rather than fifteen, this slight increase is 
probably insignificant because of the small total number of cases. 
These data and that of Chipman’s seem to indicate that the average 
upper limit of mental development lies somewhere between fifteen and 
sixteen years in ‘‘problem” children and mentally defective indi- 
viduals, at least. 


TaBLE X]I.—DrrecTion or IQ CHANGE AFTER RETEST BETWEEN THOSE WHO 
Scorep EIGHTY OR ABOVE ON First Test anp THosE Wuo Scorep 
BELOW Ericuty IQ 




















Below eighty IQ, Eighty IQ and above, 
N = 83 N = 81 
No. Per cent No. Per cent 
Individuals gained.............. 10 16 9 15 
eee ee 24 38 29 48 
Individuals constant............ 29 46 23 37 
0 Ee ere ee 190 34 222 30 
ss wet hh ak ells Wil 370 66 512 70 
Total average change......... 8.89 12.03 








In regard to the correspondence between the intelligence and tend- 
ency toward variability in 1Q ratings upon retest, former studies have 
shown that mental defectives, normal, and superior children vary to 
about the same degree. However, in his study of behaviour-problem 
children, Andrew W. Brown found the brighter children to be more 
variable. The average time interval between tests in this study was 














92 The Journal of Educational Psychology 





fifteen months, the number of cases being seven hundred seven. The 
present data agree with those of Dr. Brown in showing that in “ prob- 
lem”’ children, at least, the brighter children are more variable. 

As may be seen in Table XI, the average change in IQ points upon 
retest for those who rated eighty or above on the first is 12.03; whereas 
for those who rated below eighty, the change is only 8.89. The direc- 




















tion of change, however, does not seem to be materially affected by pl 
the intelligence level. Both groups show a significant tendency toward uf 
loss on the last test. in 
’ ' The distribution of the amount of change over this long-time " 
' period is presented in Table XII. of 
ni 
TaBLE XII.—DisTRIsuTION oF AMOUNT OF CHANGE DURING INTERVAL OF SIXTY in 
To OnE HunpRED Forty-FivE MontTss C 
( Amount of change No. of cases Per cent r 
ha ni a Sick ah Wa Glenn aie 46 37.1 7 
ip ae Gis ik adic a8 te hw eck areca oes 23 18.5 D 
Sclaclavar eds iesascucstaee vad 24 19.4 , 
iat 6 uk ht 6.00 Wad. o44 6 eee aban e waCKA 15 12.1 { 

i ec ane dcee thse nevads taeeuedes 10 8.1 
ad ois a cag dias avithuds Uoeds 2 1.8 I 
CN bc cnee ae thss~enekew ar ewben 4 3.2 ( 
{ 


TOTAL AVERAGE CHANGE 10.4 


This distribution is quite different from that secured when a short 
interval is used. Brown’s analysis of a similar group of cases with 
an average time interval of fifteen months shows that 58.9 per cent 
change five points or less, 25.8 per cent change more than five but less 
than eleven, 10.3 per cent change more than eleven but less than six- i 
teen, 2.2 per cent change from sixteen to twenty points, 1.7 per cent f 
change from twenty-one to twenty-five points, .9 per cent change from 
twenty-six to thirty points, and .1 per cent change more than thirty. 
Thus, when the interval between tests is less than three years, the 
chances are less than five out of a hundred that the IQ, upon retest, 
will vary more than fifteen points. But after a seven-year interval, 
there are twenty-five chances out of a hundred that the IQ will have 
changed more than fifteen IQ points. Such variation offers a large 
possibility of a change in rating from mental defectiveness to dull 


er Oe ee 


























Time Interval between Test and Re-test 93 


normal, dull normal to high average, high average to very superior, 
and vice versa. 


A COMMENT ON MENTAL GROWTH 


When the mental age secured by consecutive reexaminations is 
plotted against the chronological age, the expected straight diagonally- 
upward line is seldom secured. A graph upon which ten or twenty 
individual mental growth curves are plotted appears like a number of 
“‘zig-zag’’ lines scribbled on top of one another. And the same sort 
of picture is seen when physical growth curves are plotted, as may be 
noted from Baldwin’s individual curves for physical growth. Speak- 
ing of physical development, Kirkpatrick,’® in his Fundamentals of 
Child Study says, ‘‘There are also tendencies to certain accelerations 
of growth which are peculiar to individuals, for not all children, even 
of the same family, grow at the same rate at the same age. Neither 
do they all attain the same size when outer influences are the same. 
The amount and rate of growth of every child is thus largely deter- 
mined by germ heredity.’”’ The writer does not quote this as an inno- 
vation to the mental growth concept, but merely to call attention to 
the fact that curves derived from the cross-sectioning of a large popu- 
lation are not directly applicable to individual cases. In the main, 
our growth curves, both physical and mental, have been derived from 
just such cross-sectioning, the validity of which, for individual pre- 
dictions, is extremely doubtful. 

Mental development does not seem to increase by equal increments 
in correspondence to time, and as yet we cannot state the upper limit 
of growth with any high degree of accuracy in individual cases. Chip- 
man has shown that twenty-eight per cent of the morons who were 
tested after sixteen years of age increased their IQ more than five 
points upon retest. 

Speaking of the height-weight ratio as an index of physical fitness, 
Buford Johnson" states, ‘‘This criterion of fitness which has received 
wide application has been determined by obtaining averages for large 
groups of children at different ages, irrespective of hereditary tend- 
encies. Until the rate of growth and variations in relative weight for 
height for different stocks have been determined, it is unwise to adopt 
such a criterion of physical fitness.’”’” The same caution should be 
used in predictions of mental growth. Possibly different racial stocks 
vary in rate and range of development. 











94 The Journal of Educational Psychology 


That wide variations in consecutive IQ ratings may be due to 
inherited factors seems quite plausible. In the past we have been too 
prone to attribute these inconstancies to physical disease or environ- 
mental changes. Gesell!? has shown that often extreme physical 
handicap does not materially affect mental growth. Since compiling 
the data for this paper, the writer found a case which had been included 
by mistake. This child had infantile cerebral palsy with epileptiform 
seizures. A recheck on his IQ after a seven-year interval shows a loss 
of only two points—less than the average variation over a short-time 
period. Examination of the physical status of those individuals used 
in the present study whose IQ’s changed more than twenty points 
after a long-time interval as compared with those who changed less 
than five indicates no apparent explanatory differences from the stand- 
point of physical health. In general, this is in agreement with other 
studies. Garrett! quotes experiments which show that even malnu- 
trition does not lower, but even tends to raise the IQ. Agnes L. 
Rogers, Dorothy Burling, and Katherine McBride, in their study on 
the effect of environment upon the constancy of the intelligence 
quotient state, ‘‘These results show no appreciable effect of environ- 
mental change upon the intelligence quotient.’’!’ 

From the clinical aspect, at least, it is precarious to accept the 
conclusions of those workers whose concepts of mental growth have 
been derived from the cross-sectioning of large samples. In addition 
to the already large number of factors listed as possible causes for the 
inconstancy of an IQ rating, the race and intelligence level of the 
parents should be considered. 


CONCLUSIONS 


1. A time interval of from five to nine years between two Stanford- 
Binet tests increases the variability in IQ rating to almost twice that 
found when the time interval is less than two years. 

2. There are about twenty-five chances out of a hundred for the IQ 
to change more than fifteen points after a seven-year interval. 

3. There is a definite tendency for individuals to lose in IQ points 
after an interval of from five to twelve years. 

4. There is a more even distribution of losses and gains in IQ points 
when fifteen is used as the average upper limit of mental development 
as tested by the Stanford-Binet examination. 


OP LATE P RON SET ALO TRIES ee 


5 aye) Ee 





fow 
test 


mo 


1¢ 


1] 


1: 





: 
4 
e 
g 
q 


LN ARR MN NOAA EH VIE mn A 


Time Interval between Test and Re-test 95 


5. An excessive number of gains in IQ points is to be expected when 


fourteen is used as the average upper limit. of mental development as 
tested by the Stanford-Binet examination. 


6. There is a significant tendency for the brighter children to vary 


more than the dull after a time interval of from five to twelve years. 


10. 


11. 


12. 


13. 


14. 


15. 


16. 


17. 


BIBLIOGRAPHY 


. Baldwin, Bird T. and Lorle I. Stecher: Results of Consecutive Stanford-Binet 


Tests. Journal of Educational Psychology, Vol. XIII, 1922. 
Baldwin, Bird T. and Lorle I. Stecher: ‘‘The Psychology of the Preschool 
Child.” D. Appleton and Company, 1925. 


. Brown, Andrew W.: The Change in Intelligence Quotients in Behaviour Prob- 


lem Children. Journal of Educational Psychology, Vol. XXI, May, 1930, pp 
341-350. 

Chipman, Catherine E.: The Constancy of the Intelligence Quotient in Mental 
Defectives. Psychological Clinic, Vol. XVIII, 1929. 


. Doll, Edgar: The Growth of Intelligence. Psychological Monographs, Vol. 


XXIX, No. 2, 1921. 


. Fisher, R. A.: “Statistical Methods for Research Workers.” Oliver and 


Boyd, 1930. 
Foran, T. G.: The Constancy of the Intelligence: A Review. Catholic Uni- 
versity American Education Research Bulletin, 1926, No. 10. 


. Foran, T. C.: A Supplementary Review of the Constancy of the Intelligence 


Quotient. Catholic Univeristy Education Research Bulletin, 1929, No. 9. 


. Freeman, F. N.: The Interpretation and Application of the Intelligence Quo- 


tient. Journal of Educational Psychology, Vol. XII, 1921, pp. 3-13. 

Garrison, 8. C.: Additional Retests by means of the Stanford Revision of the 
Binet-Simon Tests. Journal of Educational Psychology, Vol. III, pp. 307- 
312. 

Gesell, Arnold: ‘‘Infancy and Human Growth.”’ The Macmillan Co., N. Y., 
Chap. X, 1928, p. 214. 

Gray, P. L. and R. E. Marsden: The Constancy of the Intelligence Quotient 
—Final Results. British Journal of Psychology, General Section, Vol. 
XVII, 1926, pp. 20-26. 

Hildreth, Gertrude: Stanford-Binet Re-tests of 441 School Children. Ped. 
Seminary, Vol. XX XIII, 1926, pp. 365-386. 

Johnson, Buford J.: The Mental Growth of Children, E. P. Dutton & Co., N. Y., 
1925, p. 32. 

Kirkpatrick, Edwin A.: ‘Fundamentals of Child Study.” MacMillan & Co., 
N. Y., 1929, pp. 43-44. 

Luh, C. W.: A Note on the Relation between the Constancy of the IQ and the 
Rate of Mental Growth. Journal of Genetic Psychology, Vol. XXXVI, 
1929, pp. 185-187. 

Prouty, Ruth A.: Psychological Classification versus Clinical Diagnosis. 
Psychological Clinic, Vol. XVIII, 1929, pp. 213-220. 











96 The Journal of Educational Psychology 


18. Roger, Agnes L., Dorothy Burling and Katherine McBride: The Effect on the 
Intelligence Quotient of Change from a Poor to a Good Environment. The 
27th Year Book of the National Society for the Study of Education, 1928. 

19. Rugg, L. S.: Retests and the Constancy of the IQ. Journal of Educational 
Psychology, Vol. XVI, 1925, pp. 341-343. 

20. Slocombe, C. S.: Why the IQ is Not, and Cannot Be Constant. Journal of 
Educational Psychology, Vol. XVIII, 1927, pp. 421-423. 

21. Terman, Lewis M.: ‘The Measurement of Intelligence.’’ Houghton Mifflin Co., 
1916. 

22. Terman, Lewis M.: Mental Growth and the IQ. Journal of Educational 
Psychology, Vol. XII, 1921, pp. 325-341. 


& 
é 
* 
i 
h 
ni 
oe 
. 
¥ 
A 
' 


TI 


tic 


T, 


be) 








THE EFFECT OF DIRECTIONS PRECEDING TRUE- 
FALSE AND INDETERMINATE STATEMENT 
EXAMINATIONS UPON DISTRIBUTIONS 
OF TEST SCORES 


C. C. WEIDEMANN AND LYNDALL FISHER NEWENS 


Teachers College, University of Nebraska 


I. INTRODUCTION 


Very little attention has been given to the relation of directions 
preceding examinations to test scores obtained from a given examina- 
tion. An effort was made! to establish a few sets of clearly stated 


TaBLE I.—ToTaL ENROLLMENTS BY GROUPS FOR Eacu oF THE Four TEst 




















PERIODS 
Total enrollments 
Test Semesters, 1929-1930 Semesters, 1930-1931 Grand 
groups totals 
First Second First Second 

A 63 41 70 56 | 230 

B 62 35 68 63 | 228 
C 62 34 62 56! 214? 

D 57 33 57 vse 147 
Totals.... 244 143 257 175 819 

















1 Direction sets C and D were replaced by set F during the fourth testing period. 
2 Of this total, one hundred fifty-eight were Test C, and fifty-six were Test F. 


directions for the true-false examination based upon the results of 
analysis of two hundred twenty-seven sets of\such directions found in 
present practice. 

The purposes of this study are to determine certain effects of 
selected sets of directions preceding a true-false and indeterminate 
statement examination upon 





1 Weidemann, C. C.: ‘‘ How to Construct the True-false Examination,”’ Bureau 
of Publications, Teachers College, Columbia University, 1926, pp. 6-14. 


v4 





int 


tee 
ipa 


toe eee eee 


~ J -—qein 


a 











98 The Journal of Educational Psychology 


1. The time required for administration; 

2. The reliability of the true-false and indeterminate examinations; 

3. The standard variability of the distributions of test scores; and 

4. The central tendencies of the distribution of test scores when 
given to groups of students selected at random. 


II. THE SAMPLE OF DATA 


During the final examination period of the first and second semes- 
ters of the academic years, 1929-1930 and 1930-1931 respectively, 
a one hundred sixty item true-false and indeterminate statement 
examination was given in a required course, ‘‘The Foundations of 
Modern Education,” Teachers College, University of Nebraska. 
The foregoing examination was not the final examination. It was 
announced as a test for experimental purposes which would include 
content from the entire course. During three semesters, this course 
consisted of four sections offered at 8 a. M.,9 a. M., 11 A. M., and 2 P. M. 
During the second semester, there were three sections meeting at 8 
A.M.,94.M.,and11a4.m. The total enrollment in the course at the 
end of each of the four semesters was two hundred forty-four, one 
hundred forty-three, two hundred fifty-seven, and one hundred 
seventy-five respectively. Table I includes the enrollments in each 
test group for each semester. Test groups are not the same as section 
enrollments; yet totals of the test groups equal total course enroll- 
ment in each test period. This study is, therefore, a report of sixteen 
(four tests times four groups) experiments. 


III. GENERAL PROCEDURES 


The interval between each test was one semester. The same exam- 
ination based upon the content of the course of instruction was used 
for each of the four test periods. ‘‘ A Student’s History of Education,” 
by F. P. Graves was used as a text. During the first three test periods 
direction sets A, B, C, and D were used. During the fourth test 
period, direction sets A, B, and F were used. Below (Section IV) is a 
description of each set of directions. 

In order to secure random samples of the students as well as to 
reduce cheating to a minimum, each section was divided into four 
groups during each of the first three test periods. Students seated 
in columns 1, 2, 3, 4 had direction sets A, B, C, and D, respectively; 






of 


th 


le 


tl 





Effect of Directions Preceding Examinations 99 


columns 5, 6, 7, 8 had sets A, B, C and Drespectively, andsoon. Dur- 
ing the fourth period of testing, the columns 1, 2, 3 had direction sets 
A, B, and F respectively and so on. After these examinations pre- 
ceded by the given direction set were answered, they were collected 
by the instructor. Throughout the study Set D is used as the control 
set of directions. 

Throughout the study the same instructor, Newens, and the same 
classroom for a given section in a given semester were maintained. 
The health and interest of the instructor in her work seemed consist- 
ently favorable. The general consistency of the results indicates that 
the study was executed under conditions favorable to the statement 
of conclusions which would be valid within the limits of the data. 

All test results were scored by the key of a given set of directions 
in the hands of a reader under the direct supervision of the instructor 
of the course. The method of scoring was the number right. 


IV. THE SETS OF DIRECTIONS 


The sets of directions of this study were reported! by the authors 
of a recent study. The sets without graphic illustration follow. 


Set A. 


1. If a statement is entirely true, put a plus (+) upon the space to the left of 
the statement. 


2. If any part or all of a statement is false, put a zero (0) upon the space to the 
left of the statement. 

3. Do not omit any statement. 

Set B. 


1. If a statement is entirely true, put a plus (+) upon the space to the left of 
the statement. 


2. If a statement is entirely false, put a zero (0) upon the space to the left of the 
statement. 

3. If a statement is doubtful as to whether true or false, put a capital letter D 
upon the space to the left of the statement. 

4. Do not omit any statement. 

Set C. 

1. If the chances seem to favor a decision that a statement is more likely to be 
true than false, put a plus (+) upon the space to the left of the statement. 

2. 1f the chances seem to favor a decision that a statement is more likely to be 
false than true, put a zero (0) upon the space to the left of the statement. 





1 Weidemann, C. C. and L. F. Newens: A Study of a True-false and Indeter- 
minate Statement Examination in the History of Education. Journal of Educa- 
tional Research, March, 1932, Vol. XXV, No. 3, pp. 197-210. 











100 The Journal of Educational Psychology 


3. Do not omit any statement. 
Set D. 
1. Mark false statements with a zero (0). 


2. Do not mark true statements. 
Set FP. 


1. If a statement is true, put a figure one (1) upon the space to the left of the 
statement. 


2. If a statement is false, put a figure five (5) upon the space to the left of the 
statement. 


3. If a statement is doubtful as to whether true or false, put a figure three (3) 
upon the space to the left of the statement. 
bgt 4. If a statement is more likely to be true than doubtful, yet not entirely true, put 
a figure two (2) upon the space to the left of the statement. 
ka, 5. If a statement is more likely to be false than doubtful, yet not entirely false, 
put a figure four (4) upon the space to the left of the statement. 

6. Do not omit any statement. 


A copy of the examination may be secured by writing to either 


of the authors. Samples of the items are published in the foregoing 
reference. 


V. TREATMENT OF THE DATA 


1. All scorings were checked but once. 

2. All sums of scores and transcriptions of data were checked but 
once. 

3. All calculations were executed and checked by a 20-place Monroe 
split-dial calculator and a 20-inch slide rule. Calculations were 
checked until identical answers were secured to an accuracy of two 
decimal places. 


4. Transcriptions of results and typing of tables have been checked 
by reverse reading. 


VI. RESULTS OF THE STUDY 


A. Random Sampling of Groups.—The total enrollment in each 
test group is given in Table I. 


Test Group A answered the examination with direction set A; 
Test Group B answered the examination with direction set B; 
Test Group C answered the examination with direction set C; 
Test Group D answered the examination with direction set D; and 
Test Group F answered the examination with direction set F. 


The instructor kept a semester achievement record consisting of 
bi-weekly test scores, grades on written papers, and class recitation 
estimates. The test scores consisted of both essay and objective 
test results. The achievement record was reduced to an individual 





Janu 


Jun 


rporeadp kf S&S 








? ’ 


Effect of Directions Preceding Examinations 101 


student index basis, and used as the criterion for checking the random 
selection of the individuals in each of the test groups for each of the 


Tasis I].—Tue Critica, Ratios BETWEEN THE MEAN AND THE STANDARD 
DeviaTION VALUES RESPECTIVELY, OF THE INSTRUCTOR’s TERM 
ACHIEVEMENT INDICES FOR STUDENTS GROUPED BY DIRECTION 

Sets A, B, C, D, anv F Precepine THe 160-1reEm TrvuE- 
FALSE EXAMINATION GIVEN DURING JANUARY AND JUNE 
oF 1930 anv 1931 























Critical ratio values! between each set of 
directions for 
Direc- 
Dates of tion Means Standard deviations 
examination 
sets 
Index AlB| c lp Index aAlB|clpb 
values values 
January, 1930........ .| A 40.5 |.../2.6) 2.7 |2.8) 9.1 ]...11.5] 1.7 10.5 
B 37.8 |. 0.1 10.5) 8.0}. .| 0.2 |1.9 
C 37.8 10.44 7.9}. 3.1 
D 37.3 9.5 
Fame, 1688........000. A 36.1 |.../1.4) 1.0 10.3) 10.1 11.9} 0.1 10.1 
B 38.1 |. 0.3 {1.0} 8.2 1.7 |1.9 
Cc 37.7 10.6) 9.9 10.3 
D 36.6 10.2 
January, 1931......... A 35.1 11.4) 0.1 11.7] 8.7 |...11.2] 0.1 10.9 
B 33.5 .| 1.5 10.3) 9.7 .| 1.2 10.3 
C 35.1 11.8) 8.8 10.8 
D 33.1 9.5 
June, 1931............ A 61.9 |.../0.2) 0.1? 15.2 |...10.4) 0.3% 
B 61.5 0.3 15.7 |. 0.8 
F 62.1 14.7 






































1 The difference is insignificant if the critical ratio value is less than three. 
? These values are for direction set F and not set C. 


four test periods. Table II shows the values of the means and stand- 
ard deviations respectively in every test group for each of the test 
periods. In no case is the critical ratio value as great as three. The 
conclusion is that the test groups A, B, C, D and F respectively during 
each test period were not significantly different so far as the means 
and standard deviation values were concerned. 


—s + yee + 


0b Packet - eae 











102 The Journal of Educatinoal Psychology 


B. Time of Administration of the Examination to Each Test Group.— 
A stop watch was used to record the time interval to the nearest second 
between the moment of starting the examination and the moment the 
student handed the answered examination to the instructor. (It 
should be noted that about one minute of time elapses between the 
moment a student finishes the examination, picks up his books and 
wraps, and delivers the examination to the instructor.) 


TasBLe II].—A CoMPpaRISON OF THE MEANS AND STANDARD DEVIATIONS OF THE 
Time REQUIRED TO ADMINISTER THE 160-1ITEM TRUE-FALSE EXAMINATION 
PRECEDED BY DrrecTion Sets, A, B, C, D anp F RESPECTIVELY 











Direction sets and time in minutes 
Measures 
A B C D F 
NE, ir oie ch oehh'h hee toe ee 202 196 131 116 55 
RE a aS eee 24.9 25.3 26.3 25.2 25.3 
EE ee 6.3 6.2 6.4 5.5 5.8 




















Table ITI indicates that on the average, there is a range of approxi- 
mately 114 minutes between sets A and D. The standard deviation 
between sets A and D is nearly one minute. So far as this study is 
concerned, the time difference between the random test groups is 
statistically insignificant, from the standpoints of either central 
tendency or variability. 


TaBLE 1V.—TuHeE ConsISTENCY OF THE 160-ITEM TRUE-FALSE EXAMINATION 
PRECEDED BY DrreEcTION Sets A, B, C, D, anp F, AND ADMINISTERED 
DURING JANUARY AND JUNE OF 1930 AND 1931 





























r-values and dates of administration of 160-item true-false examination] Composite 

) | of the four 

a January, June, January, June, ._, | dates of ex- 
Direction sets 1930 1930 1931 1931 Medians amination 
Nir |PE|N{| rer |PEI|Ni er PE | N r | PE n| - | PE Nir|PE 

A 63|.74| .04) 41) 85) .03) 70|.73) .04| 56|.62) .06| 59/0.73) .04/230).76|) .02 

B 62|.63) .05) 35|.79| .04| 68|.84) .02| 63).53) .06) 63/0.71) .04|228).82) .01 

Cc 62|.70| .04| 34|.60) .07| 62|.82| .03) ..|...| ...| 62}0.70| .04/158).74| .03 

D 57|.70} .05) 33).65) .07| 57|.77| .04| ..|...| ...| 57)0.70) .05)148).73| .03 

F ical <cxéel dalahel. «adh sebee 56|.68) .05| 56/0.68) .05) 56).68) .05 






























































1 These calculations are not completely statistically justifiable, and only indicate median tend- 
encies as a rough check against the r-values of the composite of the four dates of examination. 








exa 
is | 
coe 


ap) 
dif 
TA 


Ja 


Ju 


J 








Effect of Directions Preceding Examinations 103 


C. The Relative Consistency of the Examination When Administered 
with Different Sets of Directions.—The relative consistency! of the 
examination for each test group during each of the four testing periods 
is given in Table IV. There is considerable variation between the 
coefficients (Pearson-r) within any one given testing period. The 
greatest crude difference in the June, 1930, testing period between 
A (.85) and C (.60) is 0.25. The critical ratio of this difference is 
approximately 3.3 and may be considered as significant. Other 
differences are insignificant. The tendencies toward insignificance of 
Taste V.—Tue Critica RATIOS BETWEEN THE MEAN AND THE STANDARD 

DEVIATION VALUES RESPECTIVELY, OF THE CRUDE TEST SCORES FOR 

StupENts GRouPED BY DrREcTION Sets A, B, C, D ann F 


PRECEDING THE 160-1TEM TRUE-FALSE EXAMINATION GIVEN 
DURING JANUARY AND JUNE OF 1930 AND 1931 


























Critical ratio values! between each set of directions 
for 
Direc- 
Dates of tion Means Standard deviations 
examination 
set 
Tet} 4|/ Ba! c}]p|?lalBlolp 
scores scores 
January, 1930...... A 114.5).. ./19.6) 13.3 |12.2} 7.5 1|...10.410.7 (0.8 
B 96.3). . 32.0 |30.4) 7.8 0.3 10.4 
4 a ES ee 5 & (3 a pee 0.1 
D a a 8.1 
June, 1930......... A 110.1)...)11.5) 18.7 12.5) 9.6 )...|1.7/2.5 11.3 
B 94.5 27.1 |24.9| 7.91}. 0.8 (0.3 
C I SE, ee - = ) Oe Ae eee 1.1 
D A SR CE 8.3 
January, 1931...... A 113.4)...)18.3) 9.1 | 7.6) 8.6 |2.5)1.2 {1.2 
B 93.1 25.4 |23.6) 10.5 1.2 {1.1 
C a Se iF mM fee ee ee 0.4 
D  & A A peer 9.6 
a A 107.3}. . .}81.1)135.17|....] 6.9 .{1.5)3.18? 
B GEL. cckecacl Gee 6.0 10.4 
F NR ee 5.8 






































1 The difference is insignificant if the critical ratio value is less than three. 
2 The values are for direction set F and not set C. 





1 **Consistency”’ is preferred by the authors to ‘‘reliability.” 


7 . 
© eter - We 





4 
f 

. 

. 
a 








104 The Journal of Educational Psychology 


differences between the r-values is revealed among their medians. 
Their range is from 0.68 to 0.73. This is confirmed in the composite 
r-values of the four test periods. Their range is 0.68 to 0.82. 

So far as this study is concerned, the variations in consistency 
of the examination under different sets of directions is relatively 
insignificant. 

D. The Relative Variabilities of the Examination with Different Sets 
of Directions—Table V indicates the standard deviations and their 
_ respective critical ratio values for each direction set with each other set 
of directions. All of these differences are insignificant with one possi- 
ble exception. Set F with Set A shows a critical ratio value of 3.18; 
however, their crude standard deviation difference is only 1.1 points 
of score. Set F was administered but once, and with this single excep- 
tion, all other critical ratio values for the remaining sets of directions 
each administered three times, are consistently much less than a value 
of three. 

So far as this study is concerned, the standard deviatior values of 
the examination under different sets of directions are not significantly 
different, with the possible exception of direction Set F. 

E. The Variability of the Mean of the Examination with Different 
Sets of Directions—Table V indicates that the mean values of the 
examination vary under the influence of certain sets of directions. 
Before considering these differences, certain facts must be noted: 

1. D was the control direction set. The key for this set was determined by 
conferences of the authors of this article, and by utilizing other available keys, 
published authority and personal judgment. The instructor (Newens) who 


taught the course used direction set D preceding all test materials based upon the 
content of that course. 

2. C was a direction set recommended for experimental purposes by the author 
(Weidemann). The key for this set was determined by a group of twenty-one 
instructors in the history of education of the United States. 

3. B was a direction set recommended for experimental purposes by the fore- 
going author. The key was determined in the same way as the key for set C. 

4. A was a direction set recommended for experimental purposes by the fore- 
going author. The key for this set was determined by interpolation from the keys 
of direction sets C and B. 

5. F was the direction set evolving from the responses of the twenty-one fore- 
going instructors. The key was determined upon the basis of arbitrary definitions? 
of per cent of agreement of responses of the instructors to each test item. 

6. In the foregoing sets the decision scale from entirely true to entirely false 
was for 





1 Weidemann, C. C. and L. F. Newens: Loc. cit., p. 199. 
? Weidemann, C. C. and L. F. Newens: Loc. cit., p. 205. 





tru 


OF } 


of 


si 


‘~~ ee rh 





Effect of Directions Preceding Examinations 105 


(a) Set D undefined. 

(b) Set C divided into two parts at the point of doubtful as to whether either 
true or false. 

(c) Set A divided into two parts at a point between entirely true and any part 
or all of the item being false. 

(d) Set B divided into three parts. (See Section IV.) 

(e) Set F divided into five parts. (See Section IV.) 

In Table 5 it is noted that the means of 

Set D range between approximately 122 and 127; 

Set C range between approximately 123 and 128; 

Set A range between approximately 107 and 115; 

Set B range between approximately 93 and 96; and the mean set F is forty-four. 


The conclusion is that as the accuracy of definition and divisions 
of the decision scale increase, the mean values decrease. This is a 
significant fact. The opportunity for guessing a right answer under 
either set F or B is less than under sets A, C and D respectively. Set 
F actually offers five choices by recall for a one response decision to 
any given item and in a sense is similar to a five-choice multiple choice 
item by recognition requiring a one-response decision. 

The foregoing ranges are strikingly small for the values of means 
for any given set of directions during each of the four periods of 
examination. 

The critical ratio of the difference of means over the foregoing 
ranges is 


for set D the value of 4.5 
for set C the value of 4.7 
for set B the value of 4.2 
for set A the value of 8.1 


These significant ratio findings indicate rather high degree of con- 
sistency of achievement and instruction between different testing 
periods with different students, same course, same examination and 
same instructor. It might be expected that the foregoing critical 
ratio values might be much greater. 

The critical ratio of the difference of the standard deviations for 
the distributions having the same foregoing mean values is 


for ;et D the value of 1.4 
for set C the value of 2.5 
for set B the value of 3.1 
for set A the value of 0.9 


These values in the main show insignificant differences of variation 
and therefore, a high degree of consistency in achievement and instruc- 














106 The Journal of Educational Psychology 


tion between different testing periods, with different students, same 
course, same examination and same instructor. 

The major differences in the values of the means are not between 
the means of corresponding sets of directions in different testing periods, 


but between the means of different sets of directions within each given 
testing period. 


Again, Table V reveals the following facts: 


1. The differences between the means of sets C and D are insignificant. The 
. critical ratio values are 0.6, 0.7 and 1.0. This means that Set C of directions used 
by the instructor is no better or worse than the control Set D. It seems that the 
achievement and instruction of the content of the course made either set of direc- 
tions, C or D equally applicable to the examination. 
_ 2. The differences between the means of sets 

(a) Dand A are significant. The critical ratio values are 12.2, 12.5, and 7.6. 

(b) C and A are significant. The critical ratio values are 13.3, 13.7 and 9.1. 
These results mean that the achievement and instruction of the content of the 
course is such that set A is significantly not applicable to the examination. 

3. The differences between the means of sets 

(a) D and B are significant. The critical ratio values are 30.4, 24.9, and 23.6. 
Set B is even less applicable to the examination than is Set A. 

(b) C and B are significant. The critical ratio values are 32.0, 27.1, and 25.4. 
Set C is not more applicable than Set B. 

4. The differences between the means of sets A and B are significant. The 
critical ratio values are 19.6, 11.5, 18.3, and 18.1 in favor of A in every test period. 

5. The difference between the means of set 

(a) A and F is significant. The critical ratio value is 135.1 in favor of A. 

(6) Band F is significant. The critical ratio value is 67.2 in favor of B. This 
means that Set F is by far the least applicable to the examination of any of the 
directions sets. 


Set F is supported by a scoring key based upon the grouping of judgments of 
twenty-one instructors in the history of education. 


These facts are not intended necessarily to discredit any given 
set of directions. It is easily conceivable that the nature of the instruc- 
tion and learning in the course might have been such that the highest 
mean values might have resulted from the use of directions Set F 
or B or A. 

The outcome of these facts is, use a set of directions preceding a 
true-false and indeterminate statement examination whose definitions 
for response on the decision scale correspond to the nature of the instruc- 
tion for each item of the course. Then, discard such sets of directions 


as D, which do not clearly state definitions for response on the decision 
scale 









t 
Q 
I 
( 
: 








\ 


AN ARGUMENT FOR CENTILE RANKS 


DAVID C. ROGERS 
Smith College, Northampton, Mass. 


The writer has recently been busy with statistics pertaining to 
academic grades. To him it seems a matter of great urgency that 
grades given in American colleges be more definitely standardized. 
Any plan for the purpose with which he is familiar involves the 
assignment to individuals of proportionate positions in groups. 

The need for improved methods of grading and comparison extends 
much farther than college marks. Intelligence scores, entrance 
examination ratings, vocational test grades, and, beyond the provinces 
of education and psychology, the whole range of practical problems in 
which statistics are now being used—in all of these fields understanding 
and action are greatly advanced, when investigators, readers, and 
administrators come to think of individuals—among many other ways 
—in terms of their relations to proportions of groups. 

The statisticians now have a number of devices intended for meet- 
ing this need—the normal curve of distribution, sigma deviations, the 
ogive curve, percentile ranks, and others. But there are instances of 
psychologists, biologists, and mathematicians, unfamiliar with any 
one of these devices. In numerous professional and industrial fields 
in which these methods would be profitable, large proportions of 
workers continue inexperienced in their use. 

At a time when these procedures were less familiar than at present, 
there was already in quite common application a very simple device for 
accomplishing the same end in a rough way. Under this old popular 
plan, the whole group within which comparative positions were to be 
indicated was regarded as made up of individuals arranged in rank 
order, and as divided into halves, fourths, fifths, tenths, or some other 
number of equal sections. The position of the single individual as 
related to others in the total group, then, was indicated by a statement 
that he stood, for instance, in the upper half, in the third fifth, or 
in the seventh tenth. Readers were not badly puzzled by this pro- 
cedure, administrators used it without difficulty, statisticians found it 
serviceable. 

Among the current methods of statisticians, the one which stands 
closest to this older usage is that of ‘‘percentile ranks.”” Some of the 
statements explaining this method, indeed, seem to describe it as a 


107 


a ee ae. 1 - 











108 The Journal of Educational Psychology 





direct application of the old procedure, and some of the forms in which 
it has been used agree with the same conception. 

Thurstone, for example, in his excellent ‘‘ Fundamentals of Statis- 
tics,” writes as follows: ‘‘In order to avoid the necessity of stating the 
number of cases involved in designating a person’s relative position by 
rank, one may express the absolute rank in terms of percentile rank. 
In this case one states the rank that the person would have if there 
were one hundred members in the group.” On the succeeding page 
he refers to the hundredth member in a group of one hundred as having 
“a percentile of one hundred.’’! 

Similarly Whipple, in Supplement No. 3 to the Manual of Direc- 
tions of the National Intelligence Tests (1924), in giving plans for 
letter-ratings for these tests, furnishes tables in which the word per- 
centile stands for a group and in which the population is divided into a 
hundred percentiles numbered from one to one hundred. 

Along with the procedure implied in these two citations, however, 
there exists a somewhat different conception. The same two authors 
can be cited in illustrating it. 

Thus Thurstone, in the same chapter from which we have already 
quoted, makes the statement: ‘‘A percentile rank is so calculated that 
it indicates the per cent of the group which ranks below the specified 
percentile. A person who has a percentile rank of seventy-two exceeds 
seventy-two per cent of the group and is exceeded by twenty-eight 
per cent of the group,” and on a later page, “ . . . it is theoretically 
impossible for any individual to have a percantile rank of zero or a 
percentile rank of one hundred.”’ 

Whipple, also, in an article discussing the National Intelligence 
Tests, uses a diagram in which the percentiles are a series of points, 
running from zero at the lower limit of the distribution to one hundred 
at the upper limit.” 

In the first perusal of these citations, it will presumably not be 
clear to all readers just what difference there is between the first pair 
and the second pair to justify an argument. The writer is not aware 
that other workers have found a discrepancy so grave as to disturb 
them. His reason for stirring up trouble is his belief that there is 
involved here a distinct method which has heretofore received only 





1 Thurstone, L. L.: ““The Fundamentals of Statistics.’” Macmillan, New York, 
1925, Chap. 12. 

2 Whipple, Guy M.: The National Intelligence Tests. Journal of Educational 
Research, Vol. VI, 1921, pp. 16-31. 





An Argument for Centile Ranks 109 


easual recognition and which is capable of being adapted to meet the 
need to which earlier paragraphs have referred. Shall he attempt, 
Comparison of Two Ranking Methods 


( for the hundredths of the discussion, tenths are subetituted in 
ths diagram, for thousendths, hundredths) 





Cecile Points t f nal f t } 


+e 
he 
43 


+t: 
= 























pm 
}— wy — 
+— m- 4 
}— ie — 











[ 

? 

3 

~ 
ba ep —4 








= 
~aw&— 

















. 
o- 











a 





— @-~ 








ae i 
‘whe 
8 




















¥ 
° 
° 
3 
i 
> Sony 
}+—-~--+ 
ee 








Sted 























—r~ 

















* @ 

° Hy 
+ 

. = 
4 
































me & 


Cox: be Penks, Plen @—+ , * +s 
Decsle Pank Groups, | 
Plene —+ ' 











bo th Oo 
ay . e 





+ py —— 








be > ed 
—— wm -— 
\—~ 2 — 


























“none 





— — | | | 


























“Decide Vonks, Pient — 

bp tatiesd ) | od 
| 
i 











>: +k — 
s 3) 
3 i ail 


a a. 


hasmarrcel ate —* Ff - + 
_Decite Ranks, Pion @—+ 


Decita Rank Grovpa 
Pien ° — 


ea 
© . oe 














> © 
a 









































‘ 
7; wa 





a 
— 

~* he o ® 
—-a—t, 8 
L—-o —te @ 
a 

no an 









































At OF aE NP 
Wb SVE 


ot I 
a 


L- 





Decile Rent Groupe, 
Pia 


ab _), 


—"*) [2 Pere 
Bitrate - ‘ 
Oecile Ranke, Pen g—,) 


Decile Rank Groups 
Pien a + 


— 
ann 

— 

amd 

ond 
._—_ 
om 

we —4 


- : 2S 
































71? 2 7)? 7\|? Fi? PiI2 Fits a 
e 


eis 


> 
24 
a 
a 
' 


— oie & 
o 
“ 
. 
- 


—e —Ke 3 































































































— ete @ 
—w — tr 3 
4 —e @ 
— oie 
bo —e ® 


” 





be 
+. 
+o 


— 


+e 
re 


Decile Points t . 





Figure ‘ 


then, as a next step, to show that there are two distinguishable methods 
with important differences between them? Occasional reference to 
Fig. 1 will aid the reader in following the discussion. 





«- 
STs aged, 





110 The Journal of Educational Psychology 


Shall the first method, provisionally, be called “plana”? Plana, 
following the principle of a popular tradition, regards a given collection 
of individuals as arranged in rank order, divides the array into a hun- 
dred successive equal groups, and ranks any single individual by stating 
the number of the group in which he falls. The division points between 
these groups correspond to the “‘percentiles’’ or “percentile points” of 
current nomenclature. In present literature this plan mainly fills the 
place of a make-shift for assisting the supposedly faltering student into 
the relatively difficult mysteries of plan b, or for simplifying tabulations 
‘at points in which plan b would require complexities. 

Plan b, on the other hand, is a matter of technical statistical tradi- 
tion. It comes down at least from the time of Galton.’ In contrast 
with plan a, which starts with a division of the arranged total group 
into a definite set of a hundred sub-groups, plan b begins with the 
principles that, the individuals of the group having been arranged in 
order, they can be regarded as divided at any distance between the 
lower and upper limits of the whole, that this distance can be stated 
as a proportion of the whole collection, and that the point of any such 
division can be used to indicate the position (the percentile rank) of 
some single individual in relation to the rest of the group. 

In plan b, as well as in plan a, use is commonly made of the per- 
centile points as the basis for a primary rank series. Along with the 
percentiles proper explanation: ‘‘the percentiles proper’’ means here 
‘“‘the true percentiles’? numbered from one to ninety-nine, reference is 
often made in the discussion of plan b to percentile zero, the lower 
limit of the total group, and percentile one hundred, the upper limit. 
In this plan an individual is commonly given as rank number a whole 
number representing the percentile point upon which or near to which 
he stands; or, as in a variant usage, the total group is counted as one, 
the percentile ranks are regarded as proportions of one, and they are 
written as two-place decimals (as .01, .02, etc. to .98 and .99). 

At the same time it is recognized in plan b that percentile ranks 
are not limited to numbers of two places but can be written in three 


places or more (as... 10.5, . . . 99.5, . . . 99.9, . . . 99.95, ... 
99.99, and in the second form... .105,... .995, .. . .9995, 
. . . 9999). 


Through the remainder of this paper, ranks of plan b, will in all 
cases be written in the second of the two forms referred to above, i.e. as 
decimal fractions of one. 





1 Galton, Francis: ‘‘ Natural Inheritance.’”’ London, 1889, pp. 37 ef seq. 





nine 
Twe 








An Argument for Centile Ranks 111 


Instead of being employed as limits between percentile groups, as 
in plan a, the percentile points are, in plan b, in effect, the centers of 
percentile groups. Using the percentiles proper, from one to ninety- 
nine, the scheme furnishes ninety-nine percentile groups of equal size. 
Two groups of half size but particular importance remain, one at each 
end of the array. When accuracy is stressed, there are recognized, as 
basic for work under this plan, not only the percentile points at .01, 
.02, .03, etc. but also the limiting points between percentile groups, viz. 
005, .015, .025,. . . .985, .995. 

In the upper and lower hundredths of the array some variety of 
procedure has occurred. In common practice it is held that, each 
individual having finite range and being ranked as if located at the 
center of this range (see Fig. 1), there can be no individual with exactly 
the percentile rank of zero or 1.00. Through ninety-eight or ninety- 
nine hundredths of the array it is held that an individual is to be given 
as rank the number of a percentile point which is anything less than 
.005 of the total distribution distant from it. Uniformity would indi- 
cate that, the understanding being that ranks are approximate only, 
zero could be used as rank number for the lowest two-hundredth of 
the cases and 1.00 as rank number for the highest two-hundredth, or 
some appropriate phrases could be substituted, as ‘‘near-0,’’ “‘near- 
1.00.” No such usage, however, is common. Partly to obviate the 
giving of rank 0 or 1.00, partly as an adaptation to the fact that score 
distances are much greater per unit of rank near the ends of the dis- 
tribution than near the middle, the practice has been frequently 
adopted of using an added decimal number in ranking an individual 
in the lowest or highest hundredth or two-hundredth. If this 
method is carried out with precision, percentile interval .01 is 
counted as extending from .005 to .015, and there are set off below 
it rank groups: .001, .002, .003, .004, part of .005, and a small 
group below .001. Above .99, similarly, representing the interval 
.985 to .995, there are used groups: .995 (part), .996, .997, 998, .999, 
and a small additional group above .999. More commonly a sacrifice 
of consistency in the interest of practicability is allowed, and dis- 
crepancies of .005 or .0005 are overlooked while an approximate 
division of the terminal cases by thousandths is obtained under some 
simpler rule. 

In a strikingly variant usage under plan b, the one highest individual 
is given percentile rank 1.00 and the one lowest individual rank 0, 
regardless of the size of the collection. 











112 The Journal of Educational Psychology 


Assuming that the reader may now be convinced that two dis- 
tinguishable plans are really implied in the literature, it is perhaps time 
that a name of its own be given to this plan a for which rescue is being 
attempted. For this purpose the name ‘‘centile ranks” is proposed, 
and will be used in the following discussion. The more common plan 
of current usage, plan b, will be referred to, in contrast, as the “stand- 
ard percentile ranks.”’ 

In a comparison of the relative merits of the two plans, the standard 
percentile plan may appear to offer two important advantages: 

1. Its rank-numbers explicitly stand for points in the array of 
individuals, instead of intervals. Points in the percentile rank scale 
can be exactly matched against sigma-deviations, positions on an ogive 
curve, and other significant statistical measures. Used in connection 
with these other terms the percentile points and other points of the 
same decimal series have played an important part in the statistical 
treatment of distributions. 

2. As contrasted with a proposed convention which would ordi- 
narily furnish ranks merely in terms of hundredths of a total group, and 
would require a special extension or explanation if smaller divisions 
are to be recognized, the standard percentile ranking system furnishes 
an elastic arrangement under which divisions of any desired degree 
of refinement are used as a matter of regular practice. 

The importance of these features is obvious, and the value of the 
general procedure in which any distribution of objects arrayed in rank 
order is thought of as divisible into tenths, hundredths, thousandths, 
and all of the finer sub-divisions of the complete decimal system, will 
be unquestioned. Indeed, the full explanation of the centile plan for 
ranking could hardly be given without employing this procedure, and 
the centile ranks are properly regarded as being no more than a special 
conventional device within the more comprehensive framework of the 
decimal division system. 

With the obvious indispensibility of the general decimal system as 
a procedure adaptable to the division of any collection of objects fully 
conceded, it can still be argued, that for uses in which there is a question 
of ranking, whether with view to the practical disposal of individuals 
or to summaries which group individuals in graded classes, the con- 
ventions of the centile ranks offer advantages so substantial as to 
recommend them for wide adoption. 

The argument in favor of the centile ranks can be suitably devel- 
oped now by reference to the same two questions which have been 





men 
ities 


of p 


nur 
the 
tras 
othe 
are 
the 
syst 
cen 
nun 
alm 
the 
sco. 
nee 
nat 


ad 
sys 
sys 
sys 
col 


or 
ter 





An Argument for Centile Ranks 113 


mentioned in a previous paragraph as appearing to indicate superior- 
ities for the standard percentile ranks. 

First in order, the question of the relative positions in the two plans 
of points and intervals. 

For making transitions from ranks to other forms of series, and for 
numerous other purposes, points as contrasted with intervals will be 
the necessary terms. The actual difference between the two con- 
trasted ranking systems, however, is not that one uses points and the 
other intervals, but that in the standard system the percentile points 
are at the midpoints of intervals and in the centile system they are at 
the division points between intervals. For the standard percentile 
system there are three-place numbers at the division points. For the 
centile system three-place numbers are rarely needed, since its own 
number-series, from one to one hundred, represents its groups for 
almost all purposes. If, for occasional requirements, in ranking by 
the centile plan, numbers for relating the midpoirts of rank-groups to 
score numbers or to points in the normal curve of distribution are 
needed, three-place numbers from the decimal system of division would 
naturally be used. In bare simplicity of numerical nomenclature the 
advantage seems to stand on the side of the centile ranks. 

With percentile points at mid-points, as in the present standard 
system, ninety-nine groups of full size and two ¢f half-size are pro- 
vided. With percentile points at division points, as in the centile 
system, there are one hundred groups of equal sizé. 

When hundredths of a collection computed underj the standard 
system are combined into tenths, fifths, fourfls, or halves, either 
complex forms are required in calculation and gen¢rally in statement, 
or errors of .005 of the total group are concealed. In the centile sys- 
tem these combinations are made directly with précise results. 

With percentile points at division points, there is eliminated the 
problem of a special method for specifying ranks is: the lower and upper 
two-hundredths, which has been a troublesome One for the standard 
percentile rank system. 

In the first question at issue, then, there bis appeared a large 
economy on the side of the centile system. Will,a counter-balancing 
advantage for the standard percentile system be found if the question 
is argued of the extension of the ranking plan trom hundredths to 
thousandths or to more advanced stages in decinfal subdivision? 

In the use of ranked divisions for indicating dist:ibutions of scores 
for general scientific purposes, groups smaller thn hundredths have 














114 The Journal of Educational Psychology 


rarely if ever been needed. More often the groups of hundredths of 
the total collection which the recorded data have furnished, have, for 
purposes of scientific summary, been combined into groups of tenths 
or of fifths. 

The only use of three-place numbers in the finished products of 
investigation that has been at all frequent has been for ranking indi- 
viduals above percentile 99 and below percentile 1. Here they have 
been employed most often not because the reliability of the data or the 
interests of the investigator justified the greater degree of refinement, 
‘ but because in the standard system no simple convention for indicating 
a rank between 0 and .005 and between .995 and 1.00 had been agreed 
upon. 

And the three-place decimal ranks are not so complete a solution 
for the upper and lower limits of the distribution as at first appears. 
Beyond the three-place numbers the lower and upper two-thousandths 
of the distribution are at this stage of division left unnamed; so if more 
than one thousand cases are to be ranked, there will be no ranks for the 
lowest and highest cases unless four-place numbers are used. These 
ranks in turn will leave the lower and upper twenty-thousandths 
unprovided for, and so on. 

Cases may, no doubt, exist in which an investigator has had con- 
crete and important data so reliable and refined that two-place rank 
numbers would have been inadequate, and has deliberately chosen 
the method of rank numbers of three or more places for reporting his 
special facts. If so they have been few. For much the greater part 
of the field now taken by percentile ranks, the two-place numbers of 
the centile rank method will be adequate. 

Suppose thousandths or ten-thousandths are to be recognized 
where special data make them significant. Thousandths under 
the standard system are reached by dividing each hundredth into 
nine thousandths and two two-thousandths, ten-thousandths by 
dividing each thousandth into nine full parts and two half-parts, 
and so on. Under the principle of the centile system thousandths 
are obtained by dividing each hundredth directly into ten equal 
parts, finer subdivisions similarly. The phrases needed in the centile 
system, for indicating the special procedure, are a small matter as 
compared with the advantage in simplicity. 

For the main purposes of ranking, so it may be concluded, the 
question of refinement will furnish no obstacle to the availability of 


rank 
the | 


that 
fron 
conf 
the 

of { 
are 

whi 
syst 
one 


ord 
stal 
pur 
dis} 
ran 
tior 
cles 


of 1 
con 
dis: 
in j 
Th 
scit 


or. 


up 
sec 
for 
cer 


Gr 


19 





An Argument for Centile Ranks 115 


ranks of the centile form, but will indicate additional advantages on 
the side of this method. 

Against the proposed plan, to be sure, the objection could be made 
that to introduce a new convention departing by a small difference 
from an established tradition would cause unnecessary and troublesome 
confusion. The answer is that ambiguities considerably larger than 
the largest that would arise between the centile system and the form 
of percentile system which has been described above as standard, 
are already involved in the graphical methods and varying formulae 
which are prescribed for the practical application of the standard 
system in current text-books. The objection would not be a grave 
one.! 

For an indefinite range of applications, then, in the treatment of 
ordered distributions, the system of decimal division points which the 
standard percentile ranking system has employed, serves necessary 
purposes which the conventions of the centile ranks do not in any way 
displace. For the main routine of ranking individuals and of using 
ranks as a means toward the classification of distributions, the conven- 
tions of the centile ranks would seem to be a genuine aid to convenience, 
clearness, and accuracy. 

In availability for popular use the contrast between the two types 
of ranks will be still greater. A system which involves such intricate 
complexities that statisticians after forty years experience with it still 
disagree about its details and lapse into inconsistencies and ambiguities 
in its use, is not adapted to be an instrument for popular application. 
The centile system contains nothing, so far as appears, to baffle either 
scientist or untrained worker. 

Against the use of ranks, whether numerical, standard-percentile, 
or centile, there are two significant disadvantages. 

In an ordinary distribution, score ranges for rank groups near the 
upper and lower limits of the distribution are a great deal larger than 
score ranges for rank groups of equal size near the median. So, 
for instance, in a theoretically normal distribution, the score range for 
centile group two is thirty-one times as great as the score range for 





1 Cf. Garrett, Henry E.: “Statistics in Psychology and Education.”’ Longmans, 
Green and Co., New York, 1926, pp. 45-49. 
Otis, Arthur S.: “Statistical Method in Educational Measurement.” World 
Book Co., Yonkers-on-Hudson, 1925, Chaps. 3, 5, 7, 9. 
Thurstone, L. L.: ““The Fundamentals of Statistics.” Macmillan, New York, 
1925, Chap. 16. 


' 
'y M “th 
: : : is i. 
Bath 
Mt) At i 
- He 


i 
“ 
is 


| 
} 


het, 


ey 


ote ngee 


mourn o.« ~ - 
“ é = 4 ~ 
& Ce tere eT eae 44 














116 The Journal of Educational Psychology 


centile group fifty, and the score range for the ten centile groups two 
to eleven is four times as great as the score range for groups forty-one 
to fifty. In concrete statistical data similar differences are regularly 
found. Unless proper account is taken of this fact, the converting 
of scores into ranks tends in many usages to a serious over-weighting 
of small differences in score near the median, as compared with larger 
differences in score at greater distances from the median. Particu- 
larly in such processes as the computing of averages and correlations, 
_ deceptive results are obtained unless corrections for these varying pro- 
portions are made.’ 

Another disadvantage in ranks, for groups of the sizes usually 
dealt with, arises from chance irregularities in distribution, producing 
all along the array small irregularities in proportion between steps in 
rank and steps in score values. 

Some statistical workers find these defects so serious that they 
regard percentile ranks of any form as of little value and strongly 
prefer procedures which avoid them. 

Sigma deviation scores have many of the advantages of ranks, and, 
lacking the two stated disadvantages, come naturally into considera- 
tion as a substitute for ranks. 

Against the sigma deviation scores, however, there is the serious 
disadvantage that the distribution of these scores may be strongly 
affected by differences in test instruments and techniques. An excess 
of easy questions in an examination, for instance, will increase devia- 
tions from the mean for individuals low in the scale; an excess of 
difficult questions will increase deviations for individuals high in the 
scale. Sigma deviation scores, in consequence, are arbitrary in a 
sense in which ranks are not, and are subject to a source of error 
which, being unpredictable, might unless watched be more dangerous 
than the distortions to which the regular characteristics of the dis- 





1 Suppose, for example, that in a given collection of measured individuals a 
normal distribution of scores has been found, and that these scores have been 
converted into a standard score series under the plan in which the median is called 
fifty and sigma is counted as ten. Suppose there are two groups of individual 
scores to be averaged separately, four scores in each, viz. in Group A, scores 49, 
51, 73, and 77, and in Group B, scores 57, 59, 60, and 62. The direct average of 
Group A is 62.5, the direct average of Group B is 59.5. Suppose, then, that the 
collection is ranked. The centile ranks for Group A will come out, approximately, 
at 47, 54, 99, and 100, with an average centile rank of 75; Group B will give centile 
ranks, 76, 82, 85, and 89 with an average centile rank of 83. Group B with a lower 
direct average than A gives a considerably higher centile rank average. 


trib 
&pPp 


sigt 
ran 
rels 


ran 
ma 
sim 
if p 
poi 
sys 
thi: 


poi 


tov 


for 
va 
ha 





An Argument for Centile Ranks 117 


tribution of ranks give occasion. The percentile ranks of either type 
appear to be a more objective means for relative scoring, and, in the 
hands of workers not skilled in statistics, a safer one. 

As an instrument for the understanding and use of laymen, the 
sigma deviations are no better adapted than the standard percentile 
ranks. Unless some simpler plan than these is made practical, 
relative scoring as a popular achievement will be postponed. 

The conclusions are urged: (1) That the procedure of percentile 
ranks, understood in general terms, should be strongly approved for 
many purposes and its use extended; and (2) that important gains in 
simplicity, clearness, and effective accuracy will be secured in ranking, 
if percentile points will be consistently and explicitly treated as division 
points between rank groups, rather than, as in the standard percentile 
system, as midpoints within groups. For ranks computed under 
this convention, the name “‘Centile Ranks”’ is proposed. 

Suitable formulae and graphical devices for obtaining the percentile 
points are given in the various statistical text-books. These are 
already in practical use in many statistical offices. As a means 
toward saving the duplication of routine, and toward making the 
centile ranks available where secretarial assistants with time and 
training for the computations are not on call, there has been prepared 
in the Psychology Office of Smith College, a manual of tables and forms 
for obtaining centile ranks and for adapting centile ranks for use with 
various other mathematical procedures. The publication of the book 
has been delayed. It is hoped it will soon be on the market. 











j 
' 
- 


THE RELIABILITY AND VALIDITY OF 
PHOTOGRAPHIC EYE-MOVEMENT RECORDS 


ALVIN C. EURICH 
University of Minnesota 


An early attempt to measure eye movements was reported in 1846. 
Shortly after the turn of the century, Dodge developed the technique 
for securing photographic records of the saccadic movements. During 
the last three decades, this method has been perfected to a marked 
degree in the construction of the apparatus at Chicago, Stanford, and 
more recently, at Minnesota. The availability of these instruments 
accounts, in part, for the rapidly growing body of literature involving 
eye-movement records. Curiously, little attention has been directed 
to an evaluation of the technique. Queries concerning the reliability 
and validity of the records have seldom arisen in the published reports. 
Without doubt, the objectivity of the records lead to inferences that 
remain unverified by experimental analysis. In most of the studies 
few subjects were involved, chiefly because each record necessitates 
many painstaking and tedious tabulations. Furthermore, it is not a 
simple task to obtain adequate criteria for validating eye-movement 
records. 

In connection with a study of another problem,' data accrued from 
which it was possible to obtain an estimate of the reliability and certain 
aspects of the validity of photographic measures of reading. Approxi- 
mately one hundred seventy-five students at the University of Minne- 
sota served as subjects in the study. Slightly more than fifty per 
cent of this group were on probation for deficient scholarship. The 
others were succeeding with their college work at the time the records 
were being collected. 

Each student reported for one period during which his eye-move- 
ments were photographed while he read the following two paragraphs 
of non-technical material. Both paragraphs appear as a part of the 
Minnesota Reading Examination. 


I. “It has lately been said in excuse for his action by one of the European 
dictators that freedom has failed and force is the only remedy. Making a wider 
survey of history 1 would say rather that force has failed and freedom is the only 





1 Eurich, A. C.: ‘‘The Photographic Eye-movement Records of Successful and 
Unsuccessful College Students.’”’ In press, Journal of Applied Psychology. 
118 








been c 


do not 
with § 
she is 
and is 


tion, 
Anal 
Reac 
Min: 
were 
bled 
spril 


asse 
fixat 
the 


nu 
rel 
for 








46. 
jue 
ing 
ced 


nts 
ing 
ed 
ity 
ts. 
lat 
ies 
tes 


nt 


ym 
Lin 
Xi- 
1e- 
er 
he 
ds 


re- 


hs 
he 


ler 
ily 


nd 


Photographic Eye-movement Records 119 


remedy. Nothing has ever really been settled till the consent of all concerned has 
been obtained.” 

II. ‘‘But the West is not interested in France or in international affairs. They 
do not seem to know that we have reduced our Army, that Briand has shaken hands 
with Stresemann, that France is working for peace. In fact, many people think 
she is still working for war. Your country does not know what France has done 
and is still doing for peace.” ; 


Since all of the subjects were registrants in the College of Educa- 
tion, scores were available on the personnel folders for the Miller 
Analogies Tests, which serves as an intelligence test, and the Minnesota 
Reading Examination for College Students. Both forms of the 
Minnesota Speed of Reading Test and the Stanford Achievement Test 
were administered to as large a portion of the group as could be assem- 
bled. In addition, the marks in all subjects for the fall, winter, and 
spring quarter 1930-1931, were utilized as a measure of scholarship. 

The reliability coefficients for the eye-movement records are 
assembled in Table I. The term ‘‘fixational pauses” includes all 
fixations for the first and second paragraphs. The relation between 
the total number of fixations in the first paragraph and the number in 


Taste I.—CoeEFFICIENTS OF RELIABILITY 





Correlation Spearman- 
N between para- Brown 
graph ITand II | coefficient 





Fixational pauses: . 
ES cen ccs }xces sue beans Oe 173 .74 





+ .02 85 

te cs Sh Sy a be-bk weaken | 173 .83 + .02 91 
Regressions: 

CE re a re 173 .62 + .03 .77 

.58 + .03 .73 








CERES cere ne mE 173 





the second is expressed by the coefficient .74 + .02. If this correlation 
is regarded as the reliability coefficient for the number of fixations on 
one paragraph, the reliability for the total number on both paragraphs 
can be estimated with the application of the Spearman-Brown formula. 
This technique yields a reliability coefficient of .85. 

The duration of fixational pauses was determined by counting the 
number of dots on the film. Each dot represents 149 second. The 
reliability of this measure of reading ability is somewhat higher than 
for the number of fixations. The correlation between paragraphs one 


eae 


5 ESS. s- e - Ee., 
‘ aN oon — 


— 


~ a 
anil a 


= 


ht 
4 
j 








i 
a 


120 The Journal of Educational Psychology 


and two is .83 + .02. In this instance, the Spearman-Brown formula 
gives a reliability coefficient of .91. Since visual impressions are 
ordinarily absent during the saccadic eye movements, this coefficient 
may be considered as expressive of the reliability of perception time 
photographically determined. 

The reliability coefficients for the number of regressive movements 
(.77) and the duration of the fixations following such movements (.73) 
are considerably lower. If these measures are to be used in diagnosing 


_ individual cases, longer reading passages must be employed. 


It is difficult indeed to obtain adequate criteria for estimating the 
validity of eye-movement records. Usually, it is assumed that the 
individual whose eyes pause frequently while reading is an ineffective 
reader both in terms of comprehension and rate of reading. If this 
assumption is valid, relatively high relationships should be obtainable 
between the photographic record and scores on reading comprehension 
and rate tests. The coefficients in Table II, however, reveal that this 
is not the case. The correlation between the Minnesota Reading 
Examination and the number of fixations on both paragraphs is only 
—.24 + .05. Between this same test and the duration of fixational 
pauses, the coefficient is even less (—.11 + .05). In neither case is 
the relationship high although the number of fixations seems to be a 
more adequate index of reading comprehension than the duration of 
the pauses. 


TasBLeE II.—Tue RELATION oF PHOTOGRAPHIC MEASURES OF READING ABILITY 
TO INTELLIGENCE, READING, AND ACHIEVEMENT TESTS 








ei Number of Duration of 
fixations fixations 

Minnesota Reading Examination.............. —.24 + .05 —.11 + .05 
Minnesota Speed of Reading Test 

tele oie a bia nck eid be aad a he bene —.16+ .08| —.25 + .08 

EN Ie a aren Aa —.02 + .10 —.09 + .10 

Miller Analogies Test........................ — .08 + .05 — .04 + .05 

Stanford Achievement Test................... —.05 + .08 —.08 + .08 

Fall quarter marks, 1930....... 0 ............ .03 + .05 —.20 + .05 

Winter quarter marks, 1931................... .05 + .05 —.04 + .05 

Spring quarter marks, 1931................... —.01 + 06; —.06 + .06 











For the two forms of the speed of reading tests, the relationships 
are also low—the coefficients ranging from —.02 to —.25. Since the 





Photographic Eye-movement Records 121 


Minnesota Speed of Reading Test has been demonstrated! to be a 
fairly reliable and valid measure of rate of reading, it seems inconceiv- 
able that the relationship with the duration of fixational pauses should 
not be higher. If either one or both of these measures were unreliable, 
the low coefficients could be attributed to the inconsistency of the 
measuring instruments. The validity of the rate test was determined 
through relationships with other speed tests and with informal reading 
exercises. Since the latter approached an actual reading situation 
as closely as possible under conditions that yield a measure of the rate 
of reading, it seems legitimate to assume that the speed of reading test 
is a valid measure of the rate at which an individual ordinarily reads. 
The low relationships with duration of fixational pauses may be due to 
the fact that while the subject’s eye movements are being photo- 
graphed the reading situation is not a normal one. The room is 
totally dark with the exception of the illuminated printed matter. 
In addition a pencil of light is reflected upon the cornea of the eye. 
Some individuals are extremely sensitive to this light. The writer 
observed several cases where the stimulus caused an abnormal secre- 
tion of the lachrymal gland while the paragraphs were being read. A 
further annoyance results from the flicker caused by interrupting the 
beam of light every 149 second. It is entirely possible that such condi- 
tions affect some individuals more than others and consequently the 
rate at which they read may not be typical. The high reliability 
might still obtain because the subjects would probably be affected in 
the same manner while reading both paragraphs. While this may not 
be true at lower levels, it seems exceedingly plausable for college 
students. 

The relationships with intelligence and achievement are also low 
or negligible. Whether achievement is measured with the Stanford 
Achievement Test or college marks seems to make little difference. 
Perhaps a more representative group of college students would alter 
the results. The data derived from the records of one hundred 
seventy-three students involved in this study, however, warrant the 
following concluding statement. Whatever is being measured by the 
number of fixations and the duration of the fixations as determined by 
photographic records, it seems clear that the trait differs from reading 
comprehension or rate of reading as measured by reliable tests, intelli- 
gence as determined by the Miller Hard Analogies Test, scores on the 





1 Eurich, A. C.: ‘‘The Reading Abilities of College Students.”” University of 
Minnesota Press, 1931. 


SARS ea th et pe arg cg Se 
J Le = oS Fae a ee - - Bik 


ene Re . 


— oe 


— 


. —— 
a ™~, ie, - 





ae 


~~ 
Cor oo 


SS 


a 


iB . 
a - ——.< . 











122 The Journal of Educational Psychology 


Stanford Achievement Test or college marks. None of these criteria 
furnishes a satisfactory degree of relationship with the eye-movement 
records. A query might well be raised at this point regarding the 
validity of fixational pauses as determined from photographic records. 


SUMMARY 


1. The reliability of various measures of competence in reading as 
determined from photographic eye-movement records is fairly high. 
_If arranged in order of the degree of reliability, the measures would 
rank as follows: (a) The duration of fixational pauses or perception 
time, (6) the number of fixational pauses, (c) the number of regressions, 
and (d) the duration of fixations following regressive movements. 

2. The validity of photographic eye-movement records cannot be 
established with any of the following criteria: Minnesota Reading 
Examination for College Students, Minnesota Speed of Reading Test, 
the Stanford Achievement Test, or college marks. The relationship 
with each of these variables is exceedingly low or negligible. 








COMBINING ZERO-ORDER CORRELATION 
COEFFICIENTS 


JOHN W. DICKEY 


New Jersey State Normal School 
Newark, New Jersey 


We desire frequently to combine a number of zero-order product- 
moment 7r’s into one such r. As examples, we have the correlations 
between arithmetic and reading, arithmetic and spelling, arithmetic 
and history, and so on, and we desire the combination of these separate 
r’s into one zero-order r. Again one experimenter may obtain the 
correlation between arithmetic and reading using one population, 
while another experimenter obtains the degree of relationship using a 
second population, and a third result is obtained using a third popula- 
tion, etc. It would be of interest in many circumstances to know the 
correlation between arithmetic and reading when we combine the 
separate results of these uncorrelated populations. The significance 
of the combined r would be greater than that of many of the individ- 
ual 7’s. A third example would be to know how one battery test 
correlates with a second battery test when we know the inter-correla- 
tions of the sub-tests. Other problems will arise in the reader’s mind. 
The need for a technique to combine the separate r’s into one zero- 
order r is evident. 

By definition, the correlation equivalent to the combined individual 
r’s is the correlation between the combined gross scores from a number 
of tests forming one battery, with the combined gross scores from a 
second number of tests forming a secondary battery. It is not nec- 
essary that the sub-tests of these two batteries be mutually exclusive. 
They may have any number of their parts in common. The usual 
procedure is that of computing the “average r,’’ which is the arithmetic 
mean of the separate r’s. This procedure is incorrect, and often leads 
to a value of r which is too small. 

It is the specific aim of this paper to develop a technique for the 
combining of any number of zero-order coefficients into one zero-order 
coefficient. It will be noted throughout that the combined r is always 
obtained from the significant data at hand, such as the +r’s, o’s, and 
the population size or sizes. This fact, together with the way in 
which the technique lends itself to algebraic treatment, adds to its 
usefulness. 


123 


he 


. _atig> 
wn BSR ema - < at 








124 The Journal of Educational Psychology 


Let, 
X, = a gross score on test 1 
X+ = a gross score on test 2 
CUB. 2 oc 
M, = the mean gross score on test 1 
M; = the mean gross score on test 2 


ete. ... 

za, = X; — M;, 
t2 = X2— M; 
etc. . 


o, = the standard deviation of X; scores 
o, = the standard deviation of X_2 scores 


oe 
r = a zero-order product-moment correlation coefficient 
r(12.... 9)(12... k) = the coefficient equal to the combined r’s 


N or n = the population 
nn = the arithmetic mean of the n’s 
> = a summation 


In the generalized case for a correlated population, we desire the 
correlation between two battery scores, one battery having (7) sub- 
tests, and the other battery having (k) sub-tests (7) of which are 
common to both. The desired correlation is 








Tag. +--+ pjaza-e-- b= = 
Ztittet--: +2j)Gitrzet --- +2) 
= AT > a 
uM Eee 7m +++ +2) | Fart at . os +20] 
whence 
Pas... pl(ig---) = 
or er + - - + +eF) + 2(riroi02 + - rer * : 
+ Ti-1, i105) 
+ (Tipnroi101 + + +e t + 
+ 1 5x0 jor) 
(i? +e? + ++ + +07?) + Wriowe2 +--+ +rigows t+ +s + 
r j—1,;0j—-10 ;) |” 
[(o1? + 92? + + + + + 04%) + 2(rizoi9ee + + +s + 7rucwR +s + 


Te—-1,40%—10%)]” (1) 


where the subscripts of r(),) indicate the summed components of the 
respective battery scores. 

In the numerator of formula (1) there are (jk) terms in the three 
quantities combined. The first quantity is composed of (7) terms 





of 1 
rem 
con 
the 
bei: 
of » 
are 
twe 
of 

are 


ind 
firs 
of 

sql 
of 
te! 
int 
tic 


r( 








Zero-order Correlation Coefficients 125 


of the o? type, there being one o? for each of the i-elements. The 
remaining (jk — 7) terms are of the roo variety, (i) of which may be 
combined into such terms as 2rec, which is the second quantity in 
the numerator. The terms of the third quantity remain as roc, there 
being (jk — 27) in number. The o? in the first quantity are the result 
of the cross products of the respective common elements. If there 
are no common elements between the two batteries (i = 0), the first 
two quantities do not appear. If the elements of one battery be those 
of the second battery (¢ = 7 = k), the numerator and denominator 
are identical and r¢.9 = 1 as a limiting value. 

In the denominator of formula (1) we have the product of two 
indicated, square-root quantities. The first one is obtained from the 
first battery; the second, from the second battery. The subscripts 
of roq indicate these two quantities. Under the first indicated 
square-root there are two sub-quantities having (j?) terms in all, (j) 
of which are of the co? type (one for each of the j-elements), and j(j — 1) 
terms of the 2rcc type. The same reasoning applies to the second 
indicated, square-root quantity with its k-elements. These descrip- 
tions of the numerator and the denominator will facilitate in the 
application of formula (1) to any particular case. It may be said in 
passing that it is helpful to multiply an expression like (x; + zz +23) 
(21 + 22 + x4) and follow through the above discussion. 

With uncorrelated populations, where population sizes may also 
differ, formula (1) results in a slightly new form. By a process similar 
to that used to derive formula (1), we obtain 








Td = + 2 ky ee 41) = 
D(aitrtet- +: +auy(eetr’st-e:- + ©’ e+1) 
an >> 2 " 2 Ul ’ ’ 2 " 
mi =(Zi + tat ++ > + ae)? [Hat a'st + + + + 2'e41) 
n n 
whence 
TIS ss RYH + + 4D) = 
NiyT 19°F er + NeyFeydoy + +s New eiTk 4 IO 41 
[ni207; + Neon +--+ + Nk, 410 7%|?*[N1 9°07 + Noyo*y + --- 


+ Nk, W410 24 4.1] (2) 


where primes, double primes, etc., may occur in either subscript, and 
are used to indicate sets of scores made by different populations on 














126 The Journal of Educational Psychology 


the same named test. We may assume that there are no common 
elements (i = 0) between the two battery scores.! 

The numerator of formula (2) is composed of a series of terms 
each of which is the product of four elements, namely, the zero-order 
r already computed, the respective n, and the two o’s involved in 
computing this r. The total number of such numerator terms is 
(k)(k’ — 1) = (k?) = (k’ — 1)’, that is, the two batteries always 
have the same number of sub-tests. 

In the denominator of formula (2) we again have the product of 

two indicated, square-root quantities. As in formula (1), the first 
quantity is obtained from the first battery; the second, from the second 
battery. The subscripts of roo indicate these two quantities as they 
did in formula (1). Each of the two quantities is made up of (k) or 
(k’ — 1) terms (k = k’ — 1) of the no? type. Each n-coefficient of 
the o? is the population that was used in its computation. 

The reliability of the r’s resulting from formula (1) and formula 
1—?r 


VN 
When formula (2) is used n must be used instead of N. 
Two examples follow to illustrate the numerical substitution in 
formulas (1) and (2). The hypothetical data in Table I will illustrate 
formula (1); the hypothetical data in Table II, formula (2). 





(2) is obtained by the familiar, approximate formula o = 


TaBLE I.—HyYpotTsHeETICAL INTER-CORRELATION DaTa 











(N 100). 
1 2 3 o’s 
ou n'a p5e wes 6 th wae eaten Beas 10 
ES 5.0 is oo un os 0.80 ee ae 15 
eres 0.30 0.40 see 10 
ns 6cs in dean ee aanme 0.70 0.75 0.30 5 

















Let it be desired to find 1,1) (234), that is, the value of the combined 
r for the riz, 713, and 714 values which is the correlation of arithmetic 
with reading, spelling, and history combined. Here j = 1, k = 3, 
andi = 0. Formula (1) becomes 





1 The scores on test 2’ are made by population (ni), whereas the scores on 
test 2 (which is the same test as test 2’) are made by population (ny) and may 
therefore be considered as two distinct series of scores. 


Ty 


Th 
the 
int 
cor 


spe 


in 
pI 





Zero-order Correlation Coefficients 127 


120102 + 1130103 + 1140104 
Voi a2? + a3? + og? + 2(reso203 + raoe74+ To 3%) 
0.8 x 10K 15+03 x 10x 10+0.7 x 10x 5 


~ 4/100+/225 + 100 + 25 + 200.4 X 15 X 10 + 0.75 X 
: 15 X5+0.3 X 10 X 5) 
185 


~ 10+/612.5 


The correlation of r:1)(234) is seen to be 0.75 + 0.029, whereas with 
the incorrect method of ‘average r” it is 0.60 + ?. Many other 
interesting correlations such as 1(3)(124), 7(4)(123), T(12)(34), ete., could be 
computed readily with the data at hand. 

The data used to illustrate formula (2) follows, where either 
1, 1’, 1”, ete., is arithmetic, 2, 2’, 2”, is reading, 3, 3’, 3”, etc., is 
spelling, 4, 4’, 4’’, etc., is history. 


T(1(2%) = 

















= 0.75 + 0.029 





TaBLE I].—HypotTuHeticaL Data 
(Uncorrelated populations) 














Tr. = 0.80 mi2 = 100 o, =10 os = 15 
f’s = 0.30 ny3 = 200 cy = 12 C3 = ll 
ft? = 0.40 ny = 300 oy = 16 cy = 10 
T3"4 = 0.30 N34 = 400 G3" = 12 o4 = 6 





Let it be desired to find r(1y23)(23374), that is, the correlation of 
one battery composed of arithmetic, reading, and spelling with a 
second battery composed of reading, spelling, and history (the first 
column of subscripts with the second column'). Formula (2) becomes 


T(11'2"3"")( 233" 4) 
~~ N27 120102 + Nyslvsove3s + NesTryxOxvGy + Nyala a4 
V 112071 + ny3077 + Neyo” + Ny? yV/ Ni2072 + Ny303" + 
Ny yoy + Ny4qo"4 


= 100 X 0.8 X 10 KX 15 + 200 X 0.3 K 12 XK 11 + 300 X 0.4 X 
16 X 10 + 400 X 0.3 XK 12 X 6 
100 X 100 + 200 x 144 + 300 X 256 + 400 x 144 
/ 100 X 225 + 200 X 121 + 300 X 100 + 400 X 36 


= 0.38 + 0.036 




















7 47,760 
V/173,200+/91,100 











1 Any or all of the two original subscripts may be interchanged without change 
in the numerator value of formula (2). One may experiment in this manner to 
produce the greatest difference between the two square-root quantities of the 





~ 
- 7. = 
~ psd 
a 
SS ee == 


ae as 


2 am a SE 2s: 
2 Wee ae 4y 
— 


Spot et 
eT . r 








128 The Journal of Educational Psychology 


The correlation between these two batteries is seen to be 0.38 + 0.036, 
whereas the ‘‘average r”’ is 0.45 + ?. 


CONCLUSIONS 


1. The zero-order product-moment correlation coefficient equiva- 
lent to such combined coefficients has been defined in terms of the 
correlation between the combined original scores. 

2. Formulas (1) and (2) are used when the populations are corre- 


lated and uncorrelated, respectively. 


3. The combined coefficient is computed by applying formulas 
(1) and (2) to the significant data already at hand. 

4. Two numerical examples have been given to illustrate formulas 
(1) and (2). 

5. It is hoped that this method of combining coefficients, although 
t is more involved, will supplant the incorrect ‘“‘average r’’ method. 





denominator which will result in a maximum roy value. To produce the mini- 
mum fog value the difference between these two quantities must be made a 
minimum. 








36, 


iva- 
the 


1las 
ilas 


ugh 
od. 


Lini- 
le a 


A METHOD FOR TRANSFORMING ANY UNIMODAL 
FREQUENCY DISTRIBUTION INTO A NORMAL 
DISTRIBUTION 


PAUL HORST 
Personnel Research Department, Procter and Gamble Company 


Various general methods have been developed for giving analytical 
description to frequency distributions with single modes or maxima. 
Perhaps the best known methods are those coming from Pearson and 
Charlier. The work of the former is, on the whole, more accessible 
in this country and in England than is that of the latter. 

We shall not, in this discussion, enter into the advantages to be 
derived from giving analytical representation to frequency distribu- 
tions. These advantages are obvious enough to anyone familiar with 
the statistical analysis of large bodies of data. To clarify subsequent 
discussion, however, we may refer briefly to the general method of 
analysis developed by Pearson. 

Pearson writes the differential equation to the generalized fre- 
quency distribution thus 


dz bo + biz + box? 





so that the derivative vanishes for x = a and for y = 0. The con- 
stants in (1) are functions of the moments of the experimental fre- 
quency distribution. The integrated expression takes widely different 
forms according to different values of these constants. In particular, 
ifb = —l anda = b; = by = --- = 0 then (1) becomes simply 


y = Ae ” (2) 


which is, of course, the expression for the normal frequency curve. 

In general, we may note that the normal frequency function 
presents certain advantages, with reference to mathematical manipula- 
tion and practical application, over the more complicated forms. 
Mathematically, the problems of multiple correlation are much more 
easily handled when all the variables are normally distributed than 
when some or all of their distributions deviate from the normal curve. 
For example, if all the variables considered are normally distributed 
the multiple correlation surface can always be expressed in the form 

129 








130 The Journal of Educational Psychology 


__ (Zasxizs) 
s=Be ™ @#=(1---: n) 


j=(1---n) 


where the constants in the exponent are functions of the zero-order 
correlation coefficients. Thus far it has not been possible to develop 
a generalized expression for correlation surfaces which are not normal. 
Even if such a form were developed it would be extremely complicated 
and quite impractical for application to experimental data. 

Furthermore, the measurements of variables may often be accom- 
plished in several ways. And the form of the distribution will vary 
with the method of measurement employed. Thus it conduces to 
uniformity to specify the method of measurement in such a manner 
that the distribution will be normal. 

A practical advantage accruing from having all distributions 
normal finds exemplification in the case of educational and psycho- 
logical data. Here the variables involved are measured in all sorts 
of units, and to make valid comparisons from one variable to another 
necessitates a reduction to comparable units. While for compara- 
bility alone, any form may be adopted so long as all distributions are 
reduced to it, the normal form has the added advantage that per- 
centages of area may be readily converted to base-line values and 
vice versa, by means of the numerous tables available for this purpose. 
For the most part the integration of Pearson’s curves requires the use 
of quadrature formulas. 

There are other important, though less obvious theoretical advan- 
tages, in working with normal distributions, but these mentioned will 
indicate the desirability of a general method for converting any 
frequency distribution into a normal distribution. 

Analytically it would be logical to start with Pearson’s generalized 
differential equation 








dy _ (x — a)y ny 
dx bo + bir + ber? + -- - 
make the transformation 
spl la aaa (z — a) 
sine bo + biz + bex? a —e (3) 


and write 


dy _ a 
lla (4) 











rder 
slop 
nal. 
ited 


om- 
ary 
3 to 
ner 
ions 
orts 
cher 
—ura- 


are 


and 


(1) 


(3) 


(4) 


Transforming Unimodal Frequency Distribution 131 


so that 


y= Ae ? (5) 


giving a normal distribution in the new variable z. The constants 
may be determined by straightforward algebra from the moments of 
the experimental distribution. For purposes of calculation, however, 
z in (3) is more conveniently expressed as a power series in z, thus 


= +ayrxr+anx?+-:-:- (6) 
It is quite possible to write (3) in the form 
2= —(z — a)(bo + biz + bar? + - - -)-" (7) 


to expand the second factor in the right hand member and multiply 
by the first. This procedure, however, may involve rather laborious 
computations, first in the determination of the constants, and second, 
in the expansion of (7). Furthermore, difficulties with reference to 
convergence of the resulting series may be encountered. Hence the 
following method is outlined. 

First we let the original frequency distribution be represented by 


u = 0(z) (8) 


We require that the transformed distribution be 


2? 


v = o(z) = Ae 2 (9) 
We write 
O(r) = f0(x)dx (10) 
and 
@(z) = Jo(z)dz (11) 
We wish to impose the condition 
&(z) = O(z) (12) 


within the limits of the transformed and the original distributions 
respectively. We write (12) as an explicit function of z, thus 


2 = f(z) 


and assume that within the limits of the experimental data the func- 
tion may be adequately represented as a function of x by 


2=f(r) =aotayzr+az?+:---: (6) 








132 The Journal of Educational Psychology 


The problem is to determine the constants in (6). We assume 
that for all practical purposes the total area of the curve (9) is suffi- 
ciently approximated by 

3 
fe dz 
-3 


so that the area below any particular value of z is 


Pia i) (de 


Now the area represented by (8) is 


1 = [oerar 


Where a and b are the lower and upper roots respectively of 6(z). 
Then for any particular value of x we have 


ke [ 6(x) dx 


We wish to impose the condition 


| | Siiie = J (2)dz (13) 
a -3 


In terms of the original data we have 





én, x 
-_- [ewer (14) 


where n; represents the frequency of any given value of z and N is 
the total number of cases. Substituting (14) in (13) we have 


i 


zn; zi 
W = J (2) dz (15) 





Now for any value of z in the left hand side of (15) we may calculate 
@ numerical proportion. Having given the numerical proportion 
for the value of the definite integral in (15), the upper limit z may be 
readily determined from a table of the normal probability integral. 

















ime 
1ffi- 


(x). 


13) 


(14) 


V is 


(15) 


late 


tion 


ral. 





Transforming Unimodal Frequency Distribution 133 


Thus, for successive values of z we obtain the series of equations 


Ayo + Q1%q + Get. + +: : 
Qo + Q1Za41 + O21 + -:: (16) 


2 
zZ 


We may solve for as many constants in the polynominal f(x) as 
we calculate proportions for distinct values of xz. For practical 
purposes we may assume that a quartic adequately represents the 
transformation. With this assumption proportions for five distinct 
values of x would uniquely determine the constants. More than 
five determinations would require some form of adjustment of observa- 
tions except in the purely hypothetical case where the quartic exactly 
represents the transformation. But since this condition can rarely 
be satisfied with experimental data, and since it would be difficult 
to select the five determinations which are most typical of the distribu- 
tion, we make more than five determinations and solve for the con- 
stants by the method of least squares. 

The solution may, however, be simplified by properly selecting 
the z-values from which the z’s are determined, and by making a 
convenient linear transformation of these x-values. 

Suppose we determine the constants in F(x) by finding the z-values 
for nine distinct values of zx. First we determine the range of z and 
divide this range into eight equal parts so that each class interval will 
be of length B. We adopt the mid-point of the range, A let us say, 
as the origin. The linear transformation then is 


_ 2£-A 
— &B 





X (17) 


We find the percentage of cases lying below z for the nine values 
—4, —3, —2, —1, 0, 1, 2, 3, 4, and from the percentages we obtain 
the z-values as given in a normal probability table. 

Now the normal equations from which to determine the constants 
are 


QoN + ay>ax + aedx? + aztz* + agtr* = Lz 


Aplax + aylx? + agtx* + azrx* + agtx® = Lazz 
Ao Lx? + a,x? + agla*t + azslx> + agdx*® = Trxz (18) 
AoDaz*? + ayDax* + aelx® + azgrtxr® + aygda2? = Lr*z 
Ap lx* + aylx> + aetax® + azgtxz? + agdaz® = Iriz 














134 The Journal of Educational Psychology 


Since the summations run from z = 4 to z = —4 they may be 
calculated once and for all for any problem. We get then for the 
summations on the left hand side of (18) 


N= zz = 0 
=z? = 60 =z = 0 
=z* = 708 =z = 0 (19) 
~xr* = 9,780 =z? = 0 
=r* = 144,780 
and remembering that z_4 = —3 and z, = 3 we get for the summations 


on the right hand side of (18) 


zz = 2st *@2eteiteztet 2+ 23 
Yaz = — 32.3 — 22-2 — 2-1 +2,+ 22.+ 32,;+ 24 
~z*z = 923+ 422+ 2.1 +2,+ 42+ 92; (20) 
Y2°z = —27z2.3; — 82-2 — 2-1 +2,+ 82. + 272; + 384 
Y242 = S8lz3;+ 16z_.+ 2, + z; + 162, + 8lz; 


If we substitute the values of the summations in (19) and (20) in 
(18) the a’s may be solved for as linear functions of the 2’s, so that 


Qo = b_soz~3 + b_2o2-2 + - - - + dgoes 

Q@, = b_3i2-3 + b-oiz-2 + - - > + 03123 + Ky 

Gz = b_322~3 + b_2ot-2 + + + - + dgezs (21) 
G3 = b_332~-3 + b_osz-2 + - - - + .b3323 + Kz 

GQ, = b_g4@—_s + D-agt_a + + + + + Dae 


The a-values in (18) can be readily calculated having given any 
set of experimental z-values, but these a-values are obtained for the 
linearly transformed values of z given by (17). If in the equation 


Z = Ao + aX + aX? + a3X? + aX! (22) 


z—A 


B for X we get 





we substitute 


Z= Co + C1t + Cox? + cgx? + curt (23) 





Nit 


| 
‘ 


of 


Transforming Unimodal Frequency Distribution 135 | 
where | 
. 


: comma + of) ~9(8) +248) 





C, = 3% _ 2a: + 3as( 5 a) sa 3) 
a , 
C; = Bl — 405, | é 
ons C, = z 


The c-values, then, are the coefficients required to transform the 
original distribution of raw z-values into a normal distribution of 
z-values, with a mean of zero and a standard deviation of unity.' 
20) For the solution of any particular problem, however, the above 
theory need not be followed through. It is necessary only to set up ) 
the work sheet as indicated below. | 
In column 1’, section A write the class interval limits in which 
the original distribution has been divided. ‘There must be exactly 


' - eight class intervals. As a rule the range willnot be exactly divisible is 
by eight. Suppose the range is L, then let us say. . 
L 
:* S+d , 
21) ’ 
It is always possible to choose s so that the absolute value of d 
is not greater than 4. If d is even, d/2 is added to the first and last 1) 
class intervals of the distribution. If d is odd, “—t ; and Be - : are : 
ed added to the extreme intervals respectively, so that the interval having ) 
: the lowest frequency will be the longer of the two. 
In column 2’, write the class frequencies. \In column 3’, cumulate 
22) these frequencies. In column 4’, convert these into percentage values. 


In column 5’, write the sigma values of these percentages as determined 
from a table of the normal probability integral. 


~ 





1 Through a misunderstanding the material that follows was sent in a some- 
what different form as a part of another article to The Journal of Applied Psy- 
23) chology, and published in June, 1932, issue of that periodical while the present 
article in its entirety was awaiting its turn for publication in this Journal. 














































































































Ses td S888 Bd shoud fF AA BSS * 
} ! | | " | 
'groeco'z—|zz0890° leze¥¥L’ gzozes’— [soeeee'1—|ssevce” | ase | | wee 9) 
| ce eel 
icksces cc Sect A PEE i pe Neaneps: cory ee Is Sy 
+ = 6880000000" |¥%z000° |#¥e000° [FF E000 | | g 
Fe) on 810000" €96100° 689200" 9£¢9900° = eZIFIO® eoeeeeesesisoeeceses eeolecececesesisserses eeee ‘ ee eee foeoses f d 
een’ «fee Gene loneeae’ Sein Ailidiiitir a Poaeneee ss . GARE. Hae ie SO, Cate a z 
19 = 962961" GZI° |eveozs"I |69rZzhI°— —|9s00g6" sostez’ ipotzig’ =f uit tha chal “dab z 
©) = FLPIS0'F— ‘IlpzbISO"— 6IISLI° EOSeIg'I— jersoss’— lezcoce’z—|lescpze’ [°° PER Bete ede Davee’ I 
s lo06ee2 "261 |ooFFEL'Zs— |c0gz90"FI joooosz e— <3 ah ths: Dea Bees Fetees 9 
| a 
3 | | Se Ghee) i) ee) Ce Ce | ceees S 
Ss ecou ‘61 —- 2 ee Os ns Cs Ce [roe [oveee . 9 
> | OSZe Sel 008% FI — 2 ee a) Oo) eb 0° Sas Bee oes ¢ 
Ry | eth penn on rag See ioe Levys fees fase fraeses ; 
Ra | p990'G0S | GILT ZOI—) $z99°2z |oo00eL + — og peters sree Dee: Pete Draees I 
: 9Lb0E8" —_FPL000' GZ1F 10° lppzgzo’— |potztg’ |ssepee’ |: Hass: Taew Drak: brass 6 
. | 
8 ceorge’— [oc ees 17,97 ,)) Ce zperep’ — [ct cocccelecee Joos Jove Joocee 8 
S oz0gzt’ |9tizio’'— |ogetto’— |zportz |tg99ez’ |zesegz°—|"" °° atts: Brew eh ee Beers L 
cS] ofIz9e° |ze600'— lozesto’— |sest90° |serzzz |soezeo" |'****’ Sica iaded il seid Ri. 
rg 2gste° — |o9ezoo" sis900'— zzeeso’— |zsrceo’ joozese’ |-°***” ible sexe eaet iba ¢ 
° 90FF60° PZS100° “eee eee enenenee FREZEO am Be seeneneos ¢ LLIGZI *" jJeweeeee |" eee \° *-* sen 866660 r 
2060%0'— |zs9000°— loz6too’— |esesto’ ozezzo’ |g1sts0°—|"**** eal end aoe Sea g 
e9s6e0" |601£00° @19010°—  |9z9zbo"— |zgczet’ |9esz90°—|"** ness Tons Tine: apes z 
9961%%° |98S0I0° €61010°—  |9tozst’— josz90z’ joosize” |-****: eee Bree: Beet Beene I 
5 -—-— | — 
Big leze |rt |69-zo| ¢ 
S 606280" |611900°—  |z6ss00"—  [zorsor” jezsett’ jsozsz’—|s6'r |9z6" |s99 jee [19-¥9) 2 
ebzz9z°  |s0ce00"— sF6010"— —josce¥o" [r9rz9T’ |Fe6690" OFT [616° |zee jo9 |es-v¥ 9 
sosrse’ |zz9z00° 929200'— |pz¥I90°— |egovor’ |ezorie’ jos’ |sts" j2zb [ptt |o¥-8e| ¢ 
gsopie’ |spzsoo’ | [tt CIszol'— | oczzib’ of’  gi9° igce |sat |ze-or} + | 
peezst’ —|zz9z00" 929200" ¥L¥I90°— |egovor'— jezosie’ ‘9z°— |z6e° loge |set (6z-zz| & 
620%0°— |sozsoo’— _—ene0T0" eeeero’ [19FZz9T’— |re6690° |26°— [oor 96 jez |1z-F1) z 
egs6ei’— |611900°~ —/z68200" ‘zorgor’ —jezgert’— jsozszt’—|ez"t—|1¥0" Fe jee et-9 | 1 
7) @ | s | ¢ | } | g | z | I 9 | ele 2 | a | | 
s 2 NOILVOOWY NOIWLVAUOASNVA], HHL AO SINVISNOO BHL ONILVIAOIVD YOd LETHG WOM 
cami 









om Ve 


. Lamerray . eee 


. Ric 


| 
| 


‘ 068027| — 2.653648 


—1. J — .362025) 





. 324588 





- 744873} 


| 





Transforming Unimodal Frequency Distribution 137 


Columns 1 to 5 inclusive of section A are the b-values in equations 
(21) as determined by a numerical solution of equations (18). 
These values have been determined by the writer and remain the same 
for any problem. They may be copied directly from the above table. 

Column S of section A gives horizontal summations of columns 
1-5. This column also remains unchanged and may also be copied 
directly. 

Columns 1-S inclusive through lines 1-7 inclusive of section B 
are obtained by multiplying columns 1-S inclusive of section A by 
column 5’. The values in these columns will depend on the values in 
column 5’ which in turn are derived from the particular distribution 
which is being transformed. 

The values in line 8 of section B are constants which are copied 
directly from the table. 

Line 9 of section B gives the vertical summations of column 1-S 
in section B. If the multiplications and additions thus far have 
been correct the algebraic sum of the first 5 entries in line 9 of section 
B should equal the last entry to within several points in the last 
decimal. These first five values are the successive a-values in equation 
(22). 

In section C and D the c-values are determined for equation (23) 
by means of equations (24). 

The diagonal of 1’s is constant for any problem. The signs 
in section C are also the same for any problem. The other values 
in section C are determined as follows. 

In line 1 column 2 write A/B, that is the mid-point of the range 
divided by the length of the class interval. In column 3, square 
this value. In column 4 cube it. In column 5 raise it to the 4th 
power. Symbolically the first line will be 


A A\? A\3 A\‘ 
0G 


the second, 


the third, 











138 The Journal of Educational Psychology 


the fourth, 
A 
1, —45 
the fifth, 
1 


These values in the rows subsequent to row 1 may be readily 
calculated by putting first A/B in the machine and multiplying 
_ successively by 2, 3, and 4, making the entries diagonally downward 
toward the right. Then (A/B)? is put in the machine and multiplied 
successively by 3 and 6. Finally (A/B)* is multiplied by 4 and entered 
in line 2 column 5, completing the section down to line (6). 

Line 6 consists of the values 


(A). O-9)6-4). 0-4) 


The successive vertical summations of the columns in section C should 
equal the values in line 6 respectively to within several points in the 
last decimal. 

Next multiply each column in section C by the value immediately 
above it in line 9 of section B and enter the products in columns 1-5 
inclusive of section D as indicated. 

Column 5 of section D is a horizontal summation of the preceding 
columns in section D. The vertical summation of the first 5 entries 
in column S, section D, should equal the horizontal summation 
of the first 5 entries in line 6 of section D to within several points in the 
last decimal. 


In column B section D make the entries 











ng 
rd 
ad 
od 


ld 
he 


ly 


1g 


yn 
he 


Transforming Unimodal Frequency Distribution 139 


where B is the length of the class interval. Then multiply column S 
by column B and write the products in column C of section D. These 
values give the coefficients in equation (23) and are the constants 
required to transform the values in the original distribution into 
values which yield a normal distribution with a mean of zero and a 
standard deviation of unity. Thus for the distribution given by 
column 2’ of section A the transformation equation is 


z= —4.0515 + .19632 — .00282z? + .0000148z* + .000,000,0839z* 


To transform an entire distribution of values, a transmutation 
table may be readily constructed. The range of the distribution of 
original z-values is known. The first four columns of the work sheet 
will contain the first four powers of the series. These may be copied 
from any table of powers of numbers. The next four columns are 
simply the first four columns multiplied by c:, co, cs, cs, respectively. 
The ninth column is the horizontal summation of the second set of 
four columns plus the constant cy. Thus the first column gives the 
original z-values while the ninth column gives the desired z-values. 

It should be noted that the method above outlined is applicable 
only to comparatively large bodies of data. It should not be employed 
with less than three or four hundred cases. 








¥ 


PS Senin : 
Sai REL IE CT 


INDIVIDUAL VS. GROUP INSTRUCTION IN 
GRAMMATICAL USAGE! 


PAUL C. WARNER 
Principal Greenville (Ohio) High School 


AND 


WALTER S. GUILER 
Professor of Education, Miami University 
PROBLEM 


The purpose of the investigation reported in this article was to 
discover the relative merits of individual and group instruction in 
grammatical usage. More specifically, the problem was to determine 
whether individual instruction following individual diagnosis is more 
effective than group instruction in the remediation of shortcomings 
in the application of grammatical principles. The investigation also 
sought to throw light on the following problems: 

1. What amount of improvement may be expected from a limited 
period of intensive teaching and practice in grammatical usage? 

2. What correlation exists between mental ability and improve- 
ment in the application of grammatical principles? 

3. What principles of grammatical usage included in the study are 
the most difficult to learn? 


PROCEDURES 


Selecting the Pupils —The experiment was carried on with ninth 
and tenth grade pupils in the Greenville (Ohio) High School. On 
the basis of scores derived from giving the Otis Self-administering 
Test of Mental Ability, Higher Examination, Form B and the Guiler- 
Henry Preliminary Diagnostic Test in Grammatical Usage to three 
hundred sixty pupils in the grades mentioned above, three equivalent 
groups of one hundred pupils each were formed. The sixty pupils 
in the group tested that were not included in the study were those 
whose scores indicated that they did not need remedial instruction 
in grammatical usage. The three groups were equated by matching 
each pupil selected for one group with a pupil of like intelligence and 





1 Data were derived from an unpublished Master’s thesis, Department of 
Education, Miami University, 1931. Thesis was written by Mr. Paul C. Warner 
under the direction of Professor Walter S. Guiler. 


140 














te, i foe, ee. 





to 
in 
ine 
ore 
ngs 


ted 
ze? 
ve- 


are 


1th 
On 
ing 
er- 
ree 
ont 
ils 
se 
on 


nd 
of 
ner 





Individual vs. Group Instruction 141 


grammatical usage ability for each of the other two groups. Each 
group thus selected consisted of fifty-eight ninth grade pupils and 
forty-two tenth grade pupils. For convenience of later discussion, 
these groups will be hereafter referred to as (a) the experimental 
group, (b) the control group, and (c) the check group. 

Duration of the Experiment.—The experiment extended over twelve 
class periods of fifty-five minutes each. Precautions were taken to 
secure control over non-experimental factors. The instruction 
materials which were utilized in the study were kept in the classrooms 
and were in the pupils’ hands only during the twelve class periods. 
The time element was kept uniform for each pupil. If a pupil was 
absent one or more periods, he was required to make up the time lost 
before the final test was given. 

Control of Teacher Personality Factor.—In order to control the 
teacher personality factor, the experimental group and the control 
group were each divided as evenly as possible into six classes. Two 
experimental classes and two control classes were then assigned to 
each of the three teachers who co-operated in the experiment. In 
the experimental classes an individualized instruction technique 
was used, while in the control classes a group instruction technique 
was employed. The pupils comprising the check group did not 
receive any instruction at all in English during the course of the 
experiment. These pupils used their time in the study of their other 
subjects. 

Instructional Technique.—The variable element in the experiment 
consisted in the types of teaching technique that were employed. 
In the control classes neither the pupils nor their teachers were con- 
scious of the particular principles of grammatical usage in which 
individual pupils were weak. Both the teachers and pupils of the 
experimental and control classes had been told, however, that a retest 
in grammatical usage would be given at the close of the experiment 
in order to measure improvement. Hence, as a defense measure, the 
teachers of the control classes felt it necessary to instruct all the 
pupils in all of the forty-five principles of grammatical usage included 
in the pupils’ workbooks. 

In the experimental classes, the learning situation was entirely 
different from the one which obtained in the control classes. Both the 
teachers and pupils of the experimental classes were fully aware of 
the particular principles of grammatical usage, in the application of 
which each learner had encountered difficulty in the preliminary test. 








142 The Journal of Educational Psychology 


In order to discover and overcome pupil difficulties in the experimental 
classes, the following steps were taken. First, the preliminary test 
papers were carefully analyzed for individual errors. Second, diag- 
nostic charts were made showing the particular principles of grammati- 
cal usage with which each pupil had encountered difficulty. Each 
of the three teachers was provided with a diagnostic chart for each of 
her experimental classes. A limited segment of one of the diagnostic 
charts is presented below. Each cross in the chart indicates that 
some pupil made an error in the application of a particular principle 
of grammatical usage included in the preliminary test. Third, the 
learning needs of each pupil were clearly indicated in his own work- 
book. This statement of needs pointed each pupil to his own weak- 
nesses in a very definite manner. Fourth, the classroom work was 
organized on the basis of specific pupil needs. The entire class received 
instruction and practice on those principles which caused difficulty 
for the majority of the pupils. When only a limited number of pupils 
encountered a given difficulty, instruction and practice were organized 
for the particular pupils concerned. 


Cuart I.—D1aGnostic CHart SHOWING SPECIFIC PRINCIPLES OF GRAMMATICAL 
UsaGe iN Wuicn CEerTAIN TENTH-GRADE Pupits ENCOUNTERED 
DIFFICULTY 





Principles with which difficulty was 
encountered 








1. The subject of a sentence and its verb must 
eS iis 5 wk on Cine & oWid wake Well oe 
2. A compound subject made up of two singular 
nouns joined by or or nor is followed by a sin- 
i ca i ot a te ca es ctv ted la vile ty ah. cakes 
3. Pronouns referring to each, every, either, 
neither, everybody, etc., must be singular.... . » shoals uM ds ok ER ls che EEEe 
4. A pronoun must agree in person with its ante- 


5. The possessive case is used for a noun or a 
pronoun modifying a gerund............... x|x{/x{[x|x/]x]..Jx]x]..]..]x 
6. A word used in apposition agrees in case with 
ain sks have ke cs awenee hee » ae ANE Ra alls «le ale abe chee 
7. The subjunctive mode is used to express a 
IN ic iveeccentccdesssessduns ee a a 
8. A gerund phrase must clearly attach itself to 
the noun or pronoun which it modifies... ... ee ee eS 





















































~~ ie as. a> oe 








ICAL 


re 








Individual vs. Group Instruction 143 


Measuring Improvement.—In order to obtain an objective measure 
of the amount of improvement that had taken place, the three hundred 
pupils comprising the experimental, control, and check groups were 
tested at the close of the twelve periods of study by means of the 
Guiler-Henry Retest in Grammatical Usage. The retest was the 
equivalent of the preliminary test in content and in difficulty. Both 
of these tests cover forty-five principles of grammar, the application 
of each of which, with two exceptions, being measured two or more 
times. Twenty of the principles involve the use of verbs in their 
various person, number, voice, mode, and tense relationships; eighteen 
principles involve the use of pronouns in their different gender, person, 
number, and case relationships; three principles involve the use of 
adjectives and adverbs; and four principles involve the recognition of 
dangling modifiers. Each test item has a value of one point, and the 
highest possible score is ninety-seven points. The conditions under 
which both tests were given were kept uniform. 


RESULTS 


The results of the experiment are presented in the following tables. 
The first three tables show the extent to which the pupils in the 
experimental, control, and check groups had improved in ability to 
recognize the grammatical correctness or incorrectness of sentences. 
Table I records on the basis of test scores the amount of improvement 
that was made. The average scores attained by the experimental 
group, the control group, and the check group on the initial test are 
given in the first column of Table I. The second column shows the 
average score made by each of the three groups on the final test. 
The third column gives the average increase in point score for each 
group from initial to final test. The last column shows the per- 
centage of improvement made by each group. In computing the 
percentage of improvement made by each group, the actual gain in 
point score of the final over the initial test was divided by the possible 
gain in point score. 

Reference to the percentage of improvement made by the experi- 
mental group of ninth grade pupils will serve to show how the com- 
putation was actually made. Thus, there were fifty-eight pupils 
in the group, and the highest possible test score was 97 points. Hence, 
the pupils in this group might have made a total score of 5626 points 
(58 X 97). Their total score on the initial test was 1488 points and 
on the final test 2704 points. The possible gain for the group from 





144 The Journal of Educational Psychology 


TaBLE I.—AVERAGE Scores MADE BY THE DIFFERENT Pupit Groups ON INITIAL 
AND Fina Tests! 





(1) (2) (3) (4) 
Average : Average Percent- 
Average | increase in 
score on ; age? of 

epee" score on | point score |. 
initial improve- 


final test | of final over 
test ae ment 
initial test 








Ninth grade pupils. | 
Experimental group......... 25.6 46.6 | 21.0 29.4 
Control group.............. 24.9 9.1 | 43 19.6 
Check group............... 25.3 29.8 | 4.5 6.3 
Tenth grade pupils. 
Experimental group......... 36.2 58.3 22. 36.4 
ere 37.1 50.8 13.7 22.8 
Check group............... 36.6 39.7 | 3. 5.2 
Both grades. | 
Experimental group......... 30.1 61.5 | 21 32.1 
errr 30.1 44.0 | 13. 20.8 
Check group............... 30.0 33.9 | 3.9 5.9 














1 All averages were computed from the test scores. 
2 Percentage of improvement was computed by dividing the actual gain in point 
score by the possible gain in point score. 


initial to final test was 4138 points (5626 — 1488). The actual gain 
was 1216 points (2704 — 1488). Accordingly, the percentage of 
improvement was 29.4 (1216 + 4138). 

The most outstanding fact revealed by an analysis of the data 
in Table I is that individual instruction is decidedly more effective 
than group instruction in the remediation of shortcomings in gram- 
matical usage. The data in the last column of the table show that 
the percentage of improvement made by the pupils under individual 
instruction was much greater in every case than that made by the 
pupils under group instruction; moreover, the consistency with which 
individual instruction maintained its superiority over mass instruction 
in both the ninth and the tenth grades is quite marked. The per- 
centage of improvement made by the individual instruction group 
and the mass instruction group in the ninth grade was 29.4 and 19.6 
respectively; corresponding figures for the tenth grade are 36.4 and 
22.8. The percentage of improvement for both grades amounted 
to 32.1 for the experimental group and 20.8 for the control group. 
It is quite interesting to note in passing that the pupils in the check 








IAL 


t- 


int 


ain 
of 


ata 
ive 


hat 
ual 
the 
ich 
ion 
er- 
yup 
9.6 
nd 
ted 
up. 
ck 


Individual vs. Group Instruction 145 


group in each grade made some progress in grammatical usage during 
the course of the experiment even though they received no formal 
instruction in the subject. 

The tests used in the experiment covered forty-five principles of 
grammatical usage. Data bearing on the number and percentage 
of the principles with which difficulty was encountered by the various 
groups of pupils in the initial and final tests are presented in Table II. 
The average number of principles with which the various groups 
encountered difficulty in the initial and final tests are shown respec- 
tively in the first and second columns. The third column shows the 
average decrease from initial test to final test in the number of prin- 
ciples which caused difficulty. The percentage of decrease from 
initial to final test in the number of principles with which difficulty 
was encountered by each pupil group appears in the last column. 
TaBLE I].—AveraGeE NuMBER OF PRINCIPLES OF GRAMMATICAL USAGE WITH 


Wuicna Dirricutty Was ENCOUNTERED BY THE EXPERIMENTAL, CONTROL, 
AND CuHEcK Groups ON INITIAL AND FinaL TeEsts! 


























Average number | (3) (4) 
of principles with Average Percentage of 
which difficulty | decrease in decrease in 
was encountered number of prin-number of prin- 
ciples with ciples with 
which difficulty) which difficulty 
(1) (2) was encounter- | was encounter- 
Initial | Final | oq from initial | ed from initial 
test vest to final test to final test 
| | 
Ninth grade pupils. | 
Experimental group....... 27 .2 20.3 9 25.5 
Control group............| 27.1 22.8 4.3 16.0 
Check group............. 27.9 26.0 1.9 7.0 
Tenth grade pupils. 
Experimental group....... 24.5 | 15.8 | 8.7 35.6 
Control group............| 23.1 19.1 4.0 17.5 
Check group............. 24.1 22.8 1.3 5.4 
Both grades. 
Experimental group....... 26.1 18.4 7.7 29.5 
Control group............| 25.4 21.2 4.2 16.6 
Check group............. | 26.3 24.6 1.7 6.4 











? All averages were computed from the test scores. 


Analysis of Table II shows that individual instruction was de- 
cidedly more effective than group instruction in helping pupils master 








146 The Journal of Educational Psychology 


principles which help to determine the grammatical correctness 
or incorrectness of sentences. The average number of principles 
with which ninth grade pupils encountered difficulty was reduced 
25.5 per cent under individual instruction and only 16.0 per cent under 
group instruction; corresponding figures for the tenth grade were 
35.6 and 17.5. In both grades the average number of principles 
which caused difficulty was reduced 29.5 per cent under individual 
instruction and only 16.6 per cent under mass instruction. A certain 
percentage of decrease in the number of principles which caused 
difficulty is also to be observed in the case of the check group in each 
of the two grades. 

Error quotient data for the different pupil groups included in the 
experiment are presented in Table III. Average error quotients 
for the various groups on the initial and final tests are found in the 
first and second columns. The third column shows the average 
amount of decrease in error quotient from initial to final test. The 
percentage of reduction in size of error quotient from initial to final 
test is recorded in the last column. The idea of using error quotients 


TaBLeE IIJ.—Averace Error QUOTIENT PER PUPIL 








(3) (4) 
(1) (2) Average f Percentage of 
Average error|Average error Pr esonnat 7 decrease in 
. ; ecrease in ; 
quotient on | quotient on tient error quotient 
initial test final test ring tial from initial 
to final tent | *° Ste! test 
Ninth grade pupils. 
Experimental group.. . . 366 . 257 .109 29.7 
Control group........ .375 . 300 .075 19.9 
Check group......... . 367 .348 .019 5.3 
Tenth grade. 
Experimental group. . . 311 . 203 . 108 34.6 
Control group........ . 306 244 .062 20.4 
Check group......... 311 . 298 .013 4.1 
Both grades. 
Experimental group. . . . 343 . 235 .108 31.6 
Control group........ . 346 . 276 .070 20.1 
Check group........ .344 .327 .017 4.9 

















as a measure of mastery seems to have originated with Stormzand 
and O’Shea, who state that these quotients are ‘‘determined by using 
the frequencies of error for an individual or for a group as a numerator 








e8s 
les 
ced 
der 
ere 
les 
ual 
ain 
sed 
ach 


the 
nts 
the 
age 
‘he 
nal 
nts 


| SEBBS 


ind 


tor 


Individual vs. Group Instruction 147 


of a fraction in which the denominator shall represent chances for 
error.’’! Since the error quotient considers the number of mistakes 
with relation to the number of opportunities to make mistakes, it is a 
much more significant and valid measure of the prevalence of error 
than is a mere count of errors. 

Reference to the top figure in the first column will serve to show 
how the error quotients were computed. Thus, there were fifty-eight 
pupils in the experimental group in the ninth grade. Since the test 
covers ninety-seven items, there were 5626 chances (97 X 58) for 
this group to make errors. The number of errors actually made 
by the group was 2059; hence, the error quotient was .366 (2059 + 
5626). 

An analysis of Table III shows that individual instruction was 
much more effective than group instruction in reducing the size of 
error quotients. The error quotients for the experimental and control 
groups in the ninth grade were reduced 29.7 per cent and 19.9 per cent 
respectively; corresponding figures for the tenth grade were 34.6 
and 20.4. For both grades the percentage of reduction in error 
quotient amounted to 31.6 for the experimental group and to 20.1 
for the control group. A certain amount of reduction in error quotient 
also took place in the check group in each grade. 

One of the minor problems with which the experiment was con- 
cerned relates to the amount of improvement in grammatical usage 
which may be expected in a limited period of intensive teaching and 
practice. Table IV, which was constructed from data contained in 
Tables I, II, and III, shows the amount of improvement that was 
made in twelve class periods by groups of pupils equated on the basis 
of intelligence and achievement in grammatical usage. The reader 
should recall that instruction was individualized in the experimental 
group, that mass instruction was employed in the control group, 
and that no formal instruction in grammar was given the check group 
during the course of the experiment. A study of Table IV shows 
that for the ninth and tenth grades combined, the percentage of 
improvement in test score was 5.9 in the check group, 20.8 in the 
control group, and 32.1 in the experimental group. For both grades 
combined, the percentage of decrease in number of principles with 
which difficulty was encountered was 6.4 in the check group, 16.6 
in the control group, anb 29.5 in the experimental group. For both 





1Stormzand, Martin J., and O’Shea, M. V.: “How Much English Grammar?’’ 
Baltimore: Warwick and York, Inc., 1924, p. 14. 











148 The Journal of Educational Psychology 


grades combined, the percentage of reduction in size of error quotient 
was 4.9 in the check group, 20.1 in the control group, and 31.6 in the 
experimental group. 


TaBLE I1V.—IMPROVEMENT RESULTING FROM INTENSIVE TEACHING AND PRACTICE 
IN GRAMMATICAL USAGE 




















Experi- 
Improvement stated in terms of yo po mental 
p p group 
1. Percentage of increase in test score.! | 
RET A ny Ce ae ee 6.3 19.6 29.4 
EEE De ae ee 5.2 22.8 36.4 
ELNINO TI I 5.9 20.8 32.1 
2. Percentage of decrease in number of princi- 
ples with which difficulty was encountered. 
GLa etch wake doe ce bh aes 640 | 7.0 16.0 25.5 
EE Pes 2A Sa el os 5.4 17.5 35.6 
EEE a ee ee eee 6.4 16.6 29.5 
3. Percentage of reduction in size of error 
quotient per pupil. 
ee a a8 Vadis ded hve Atos 5.3 19.9 29.7 
feng, asc agence ee he keke o> 4.1 20.4 34.6 
Ns ion 5 haw he book cee wri 4.9 20.1 31.6 





1 Percentage of increase in test score was computed by dividing the actual gain 
in test score by the possible gain in test score. 


The second minor problem sought to discover the correlation 
which exists between intelligence and improvement in grammatical 
usage. On this problem the data presented in Table V throws con- 
siderable light. An examination of the tabular data shows, on the 
whole, that the pupils with the better intelligence scores made the 
greatest amount of improvement. When the pupils in the experi- 
mental and control groups are arranged into quarter groups on the 
basis of intelligence quotients, it is found that every intelligence group, 
except the first, made more improvement in grammatical usage than 
did the group below. When the pupils in the experimental and control 
groups are arranged into halves on the basis of intelligence quotients, 
it is found that the upper half in intelligence made more improvement 
in grammatical usage than did the lower half. The improvement 
in grammatical usage made by the better intelligence half of the 
experimental group over that made by the lower intelligence half 
of the same group is indeed quite marked. 








Individual vs. Group Instruction 


149 


TaBLE V.—RELATION OF INTELLIGENCE QUOTIENT TO INCREASE IN TEsT ScoRE 








Average | Average 
Intelligence-quotient | _intelli- Average Average increase Fereentage 
initial retest . . of possible 
group gence ant anaue conte in point gain 
quotient score 

Experimental group. 

First quarter...... 118.6 39.7 62.7 23.0 40.1 

Second quarter 106.9 34.8 62.0 27.2 43.8 

Third quarter...... 100.8 27.1 46.7 19.6 28.1 

Fourth quarter... . 91.7 18.6 34.6 16.0 20.4 
Control group. 

First quarter...... 119.5 41.0 52.0 11.0 19.6 

Second quarter 109.5 30.5 46.8 16.3 24.5 

Third quarter...... 102.1 27.3 42.2 14.9 21.4 

Fourth quarter... . 89.9 21.4 35.0 13.6 17.9 
Check group. 

First quarter...... 114.5 39.0 42.2 3.2 5.4 

Second quarter 109.8 33.5 36.8 3.3 5.3 

Third quarter...... 98.9 26.7 32.3 5.6 8.0 

Fourth quarter.... 85.1 20.7 24.4 3.7 4.8 
Experimental group. 

Upper half........ 112.8 37.3 62.4 25.1 42.0 

Lower half........ 96.2 22.9 40.7 17.8 24.0 
Control group. 

Upper half........ 114.5 35.8 49.4 13.6 22.3 

Lower half........ 96.0 24.4 38.6 14,2 19.6 
Check group 

Upper half........ 112.1 36.3 39.5 3.2 5.3 

Lower half........ 92.0 23.7 28.3 4.6 6.3 




















The third minor problem was concerned with discovering the 
particular principles of grammatical usage that are the most difficult 
to learn. Possibly the best way of ascertaining the learning difficulty 
of the various principles consists in the following steps in the order 
given: (1) Discovering for each pupil the particular principles with 
which he encounters difficulty; (2) subjecting each pupil to intensive 
teaching and practice in the principles with which difficulty was 
encountered; and (3) discovering, in terms of error quotients, the 
extent to which the pupils make errors in the application of the 
various principles after intensive teaching and practice have ceased. 
These steps of procedure were observed with the pupils in the experi- 
mental group. The principles having an error quotient of .250 











150 





The Journal of Educational Psychology 


and above for ninth grade pupils, as determined by the final test, 


are 


presented in Table VI. 


TaBLE VI.—PRINCIPLES OF GRAMMATICAL UsaGE HavING THE HicHEest ERRoR 
QUOTIENTS FOR THE EXPERIMENTAL GRovuP OF NINTH GRADE PUPILS 


10. 


11. 


12. 


13. 


IN THE Finat Test! 


STATEMENT OF PRINCIPLES 


. The present participle should be used only to denote an action con- 


sistent with the time of action of the main verb. Ezample: Having 
asked his friend for a dollar, he was soon enjoying a good meal 


. General truths, or statements which are still true, are put in the pres- 


ent tense. Ezample: He always declared that blood is thicker than 


eee ee ee eeeee eee eee eee eeeeee eee ereeeeneeeeeeee eee eeeeeenee 


. A gerund phrase must clearly attach itself to the noun or pronoun 


which it modifies. Hzample: While talking to the principal, I was 
told that I had failed in English 


“ee ere eee eeereneee eee en ee ee eee eee eee 


. A pronoun must have only one antecedent to which it might seem to 


refer. Example: Behind the house was a row of trees which my 
EEN, : OT EEE Te Pare 
The antecedent of a reference pronoun must be expressed and not 
inferred or understood. Ezample: Holland is a thrifty country; all 
the people there save money 


eee ewr eee eee eee een eee e eee eee ee eee eee 


. An infinitive phrase must clearly attach itself to the noun or pro- 


noun which it modifies. Example: To enjoy music thoroughly, one 
must be in the right mood 


eeaee ee eee eeneone 6004466684684 O88 888666 0 


. The following nouns are always considered plural: Oats, riches, eaves, 


proceeds, trousers, pincers, shears, links (golf), annals, nuptials, and 
scissors 


eee e ee eee eee ewe ewe ee ewe eee eee eeeer ene eereeeeeereeeeaeeeeenenee 


. The case of a noun or pronoun following than or as is determined by 


its use in the clause expressed in full. Ezample: I value him as an 
employee as highly as her 


“eee eeree ee eeeeeeeewneneeeeneeree eee eeeeenee 


. A participle must clearly attach itself to the noun or pronoun which it 


modifies. Example: Rounding a turn in the road, we saw the village. 
A reference pronoun agrees in number with the nearer noun when the 
antecedent is made up of a singular and a plural noun joined by or or 
nor. Example: Neither the director nor the two violinists have paid 
ak Lic es St i el Oe tie ek Ok eit 
The possessive case is used for a noun or pronoun modifying a gerund. 
Example: There was no real hope of its saving his life............. 
Infinitives are present unless they represent an action earlier than 
that of the main verb. Ezample: He was sorry not to be able to go. 
A compound subject made up of both singular and plural nouns is 
followed by a verb which agrees in number with the nearer noun. 
Example: Either the policemen or the firechief is down on the front 


Error 
QUOTIENT 
tn Finau 
TEst 


.612 


. 500 


. 456 


. 387 


.379 


.370 


. 362 


. 362 


. 362 


.353 


. 353 


.353 


.327 








, 
'y 


Individual vs. Group Instruction 151 


TaBLE VI.—Continued 


14. The possessive case should indicate possession, not the object of an 
action. Example: The death of Lee came as a shock to both North 


I 0k. 606 ctwhetee Kbhemddeteend oh kde We abnntan ss kare .318 
15. A word used in apposition agrees in case with its antecedent. Ezam- 
ple: Three boys—John, James, and I—are to go.................. .318 


16. Pronouns referring to each, every, either, neither, everyone, every- 
body, anyone, anybody, etc., must be singular. Example: Everyone 
is asked to give his pennies to the fund for the poor............... . 258 
17. Reference pronouns like this, that, which and it should be used to refer 
to a single definite word as an antecedent and not to a clause or an 
idea. Example: I reveled in the large library, a fact which my aunt 
an cud as ea Ade we ehek« 6 obs dé ed.eieh obs eee aie . 258 


1 The Principles stated above are those in a list of forty-five principles having 
error quotients of .250 and above in the final test. 


SUMMARY 


The following statements, which are supported by the data that 
have been presented, are made by way of summary and conclusion. 

1. Individual instruction based on individual diagnosis is found 
to be much more effective than mass instruction in the remediation 
of short-comings in grammatical usage. This statement is based 
on an analysis of (a) test scores, (6) principles of grammatical usage 
in which difficulty was encountered, and (c) error quotients. 

2. A large amount of improvement in grammatical usage was 
made by the students in both the experimental group and in the control 
group. 

3. A very definite relation exists between intelligence as measured 
by the Otis Self-administering Test of Mental Ability, Higher Exam- 
ination, and improvement in ability to recognize the grammatical 
correctness or incorrectness of sentences. The effect of intelligence on 
improvement under individual instruction is quite marked. 

4. Judged on the basis of error quotients in the final test, certain 
principles of grammatical usage are more difficult to learn than are 
other principles. 

5. The pupils included in the experiment varied greatly in their 
mastery of the total field of grammatical usage and in their mastery 
of specific principles. The scores on the preliminary test showed 
that certain pupils in the tenth grade were below the standard for 
the eighth grade, while other pupils ranked above the standard for 
college freshmen. 

6. In both the ninth and the tenth grades, many pupils were 
found who had not mastered certain principles which presumably 
had been learned in the early grades of the elementary school. 








ws 
A ar “= . 
| foe? 
- 
<a tteatttermte 


a aa ee ee —— ee 








ABNORMAL VS. NORMAL PSYCHOLOGY AS A BASIS 
FOR THE ELEMENTARY COURSE 


KENNETH SELTSAM! 
Department of Psychology, University of Minnesota 


That the introductory course in psychology has for some time been 
rather unsatisfactory in many instances is evidenced by the number of 


' symposiums held, round tables conducted, and articles written on the 


subject. To be sure this unsatisfactory state of affairs is not universal. 
It is nevertheless widespread enough to justify certain considerations, 
with the idea in view of contributing something to the general concep- 
tion of the place the beginning study in psychology should occupy. 

There is perhaps no more thorough analysis of the difficulties 
inherent in the typical course, as now taught, than that of Schoen, 
several years ago.” His classification lists the weaknesses of the course 
in psychology as: Training for advanced work rather than for an 
appreciation of the subject; emphasizing a mass of more or less mean- 
ingless data; the presentation of systematic conflict, which at best 
should be confined to the advanced courses in theory; illogical sequence 
in the conduct of the course, leaving the student as much “‘at sea”’ at 
the conclusion as at the beginning; over-emphasis on theory and eva- 
sion of application; and finally, the inclusion of contentious material 
with regard to certain concepts. 

Only in a short closing paragraph of an outline presented by this 
same author is ‘‘the organism as a whole, as it functions normally and 
abnormally” presented for study. It is the purpose of the present 
paper to consider this tendency for the personality, normal and 
abnormal, to be shunted off in the elementary course into an insignifi- 
cant or non-existent position. It is the contention of the writer that a 
great deal of failure of the introductory course hinges upon this fact. 

Before attempting to indicate the possible adaptation of abnormal 
psychology to the position of an introductory course, we must no 
doubt show wherein such a study is congruent with the essential aims 
and purposes of the course. Beginning psychology as now conducted 
proceeds on the assumption that there is such a thing as a normal 





1 Deceased Novermber 30, 1930. 
2 Schoen, Max: The Elementary Courses in Psychology. American Journal of 
Psychology, Vol. XX XVII, 1926, pp. 593-599. 


152 


a 
_ 


~ wt nt lt lrhhlCUr lUrOUlUC OCOlUCOlCUeKlC EO! UDlCUCU DUC] 


= _ » -- a —_lCltrrlk _ 








is 
id 
nt 


5. 


10 
18 


al 


of 


Abnormal vs. Normal Psychology 153 


individual, which leads one immediately to a host of questions: Is not 
the so-called ‘‘normal’’ psychological individual a mere statistical 
myth—as much so as the anatomical normal, Davenport constructed 
from a mass of army records? Do not such classifications of personal- 
ity as Conklin’s “ambivert”’ have value only as recognized theoretical 
abstraction from real phenomena? For the advanced student perhaps 
there is no difficulty in realizing the nature of the normal individual 
in question. For the elementary student, on the other hand, the 
concept of normality, as a mathematical sort of thing, is not 
clear. 

Anyone who reads the reports of the American Medical Association, 
in particular those comparing the number of beds occupied by mental 
cases and commoner diseases, cannot help wondering if it is not, after 
all, psychologically a bit normal to be abnormal, however paradoxical 
such astatement maysound. Thisis but one indication of a change that 
is coming in our general conception of abnormality. When compared 
with our modern discussions, treatises that found their way years ago 
into the Journal of Insanity seem particularly queer. Along with our 
appreciation of the relativity of all things, has come a recognition of 
gradations in the social acceptability of certain personality integra- 
tions. To the student first entering a class in academic psychology, 
nothing is more genuinely real than are these facts of personality 
organization, and disorganization. They are not foreign. However 
great his isolation, he cannot have avoided human individuals. It 
would seem then that the abnormal field provides quite as advantage- 
ous a basis for the introductory work as does the normal. 

In a study conducted at the University of Minnesota,' by the 
questionary method during a period of two years upon nearly three 
thousand elementary psychology students, certain facts were found 
bearing on this general problem. When asked which section of the 
course was most enjoyed, the response of Ants students was in favor of 
that dealing with abnormal psychology. Again when questioned as to 
the amount of the course spent on each section, the only change which 
the majority of students suggested was an increase in emphasis on 
personality integration. It would seem then that our speculations as 
to the attitudes of students toward the abnormal phase of psychology 
are not entirely of the ‘“‘arm-chair’’ variety. 





1Longstaff, H. P. Analysis of Some Factors Conditioning Learning in 
General Psychology. J. Applied Psychol., Vol. XVI, Nos. 1 and 2, 1932. 


Li 
a 


’ 
on 














154 The Journal of Educational Psychology 


A criticism of the idea here presented, which comes perhaps before 
any other, is that the mass of experimental data which goes to turn 
psychology’s face toward scientific standing would in such a course be 
sadly neglected. True enough a great deal of the remarkable accu- 
mulation of experimental studies which strikingly dominate the some 
five or six new texts which have appeared in the last two years, might 
necessarily be ignored. But then, one has yet to find proof that those 
studies have functioned efficiently either in the development of an 
appreciation in beginning non-professional students, or of a truly 
scientific understanding in the case of those who major in psychology. 
The elementary pupil is thrust into an experiment, for example, in 
visual perception of movement. In due time, the instructor assumes 
often quite unjustifiably, that the material has ‘‘taken.’”’ In most 
cases, the student has “‘been taken.”’ He may even pass the master’s 
degree level with an inadequate understanding of the phenomena in 
question. All the while it is falsely taken for granted that he has 
earned certain facts in the elementary course. 

Approaching the problem from a different angle, it is altogether 
possible that many of the facts of experimental psychology might be 
presented with greater satisfaction from the abnormal than from the 
normal standpoint. Is it not probable that an unschooled sophomore 
will come to appreciate the functioning of the nervous system by 
being shown the effects of certain disturbances, as for example in 
paresis or epilepsy? How much more real to such an individual is a 
discussion which shows the effect on the approach to life of visual 
abnormalities—nystagmus, or myopia—than a passage which attempts 
to show wherein the blindspot is not blind. The beginner in psy- 
chology cannot help realizing that the human specimen with an 
IQ of one hundred, about which centers the usual consideration of 
intelligent behavior, is not the one, from the standpoint of everyday 
psychology, who needs consideration. He knows that in the com- 
munity it is rather the imbecile, the moron. or the genius who demands 
understanding by those with whom he lives. Here we strike at the 
very heart of the aims of any course in psychology. 

It is not assumed that a course as here suggested would represent 
a “cure-all”? for the elementary instruction. Such would, at best, 
be an absurd contention. It is maintained, however, that possibly 
personality integration, normal and abnormal, represents a profitable 
approach to the whole field of psychology. At any rate, we are hardly 
justified in being satisfied with the course when year after year, we 





recel 
of ce 
shou 
ber 

shou 


Psyc 








ore 


cu- 
me 
ght 
ose 

an 
uly 
gy. 
in 
nes 
ost 
er’s 
, in 
has 


her 


the 
ore 


in 
is a 
ual 
pts 
Sy- 
an 
1 of 
day 
»m- 
nds 
the 


ent 
est, 
bly 
ble 
dly 

we 


Abnormal vs. Normal Psychology 155 


receive notices, outstanding cases though they may be, of the votings 
of certain institutions with regard to the value of courses. Psychology 
should never stand as the most valueless course—as it has in a num- 
ber of instances. Surely Schoen is correct in saying, “Psychology 
should be the most vital course in the whole college curriculum.”* 





1 Schoen, Max: The Elementary Courses in Psychology. American Journal of 
Psychology, Vol. XX XVII, 1926, pp. 593-599. 











— = 
PE, pp SE 


Pe ee ee 
EI RI LES 


BOOK REVIEWS 


Knicgut Dunuap. Habits: Their Making and Unmaking. New 
York: Liveright, 1932. Pp. X + 326. 


The central theme of this book, which deals with habit in all its 
aspects, is to be found on p. 78. Here the author repeats from an 
earlier paper his three alternative theories of learning: (1) The alpha 
hypothesis, the old brain-path theory, namely, that the occurrence 
of a response increases the probability that when the same stimulus- 
pattern occurs the same total response will again occur; (2) the beta 
hypothesis, the reverse of the alpha, namely, that the occurrence of a 
response lessens the probability that on the recurrence of the same 
stimulus-pattern, the same response will recur; (3) the gamma hypothe- 
sis is that the occurrence of a response in itself has no effect on the 
probability of the recurrence of the response. 

For almost all learning, the alpha hypothesis fails, which is con- 
trary to the general belief that “ practice makes perfect.” Apparently 
the author believes that the beta hypothesis is the one to swear by. 
Later in the book in the chapter on “the breaking of specific bad 
habits,” he takes three classes of bad habits, (stammering, including 
stuttering; tics; and bad sexual habits or vices, such as masturbation 
and homosexuality) and shows how these may be broken by deliber- 
ately practising them. It is the cure by the hair of the dog that bit 
you, 2.e. the beta hypothesis cure. It is, however, almost certain that 
stammering can best be cured by making the stammerer speak in a 
pitch (usually lower) other than the one he is accustomed to use and 
this is the method which should be employed, rather than that of 
deliberately practising the stammering act. 

There is a wealth of sound and challenging material in the book, 
but the ‘‘I-know-it-all” attitude, which intrudes too often, arouses the 
opposition of the reader. The bibliography is a magnificent one and 
reflects great credit on the compiler, Dr. Willis C. Beasley. The 
author spells practise and practice inconsistently, and also the name 
of Ebbinghaus. Finally the jacket reminds one of the blurbs around 
very popular treatises; it isn’t dignified enough for a good and serious 


study of Habit. P. SANDIFORD. 
University of Toronto. 


156 





JE/ 











New 


its 

an 
pha 
nce 
lus- 
heta 
of a 
me 
he- 
the 


on- 
tly 
by. 
ad 
ing 
ion 
er- 
bit 
lat 
1a 


of 


Ik, 
he 
nd 
he 
ne 


us 





Book Reviews 157 


JEAN PraGet, The Moral Judgment of the Child. New York: Harcourt, 
Brace & Co., 1932. Pp. IX + 418. 


This is the fifth volume in Piaget’s series of studies of children’s 
ideas, those preceding having dealt with language and thought, judg- 
ment and reasoning, the conception of the world, and the conception 
of causality. The present study is, to the reviewer, the most valuable 
of this significant series, in that it deals scientifically with the sources of 
moral behavior. 

Piaget states in his preface that he is concerned with moral judg- 
ment and not with moral behavior or sentiments. One wonders why 
he found it necessary to make this distinction. Do children really 
draw a line between precept and practice, as do their elders? If they 
do, then Piaget’s study is quite worthless, since it deals with words 
rather than with deeds. But fortunately there is no indication in 
his results that the children he consulted were as yet sufficiently 
“civilized”’ to find it necessary to use nice words as smoke-screens for 
their acts, so that his data bear upon moral behavior as well as upon 
moral judgments. 

The book should be pondered long and seriously by our moral 
guardians, whether parents, teachers or ministers. Piaget’s data, 
obtained from conversations with children from three to twelve years 
of age, indicate clearly that moral judgments of children change radi- 
cally with growth in consciousness of selfhood. ‘“‘For very young 
children,”’ the author finds, ‘‘a rule is a sacred reality because it is 
traditional; for the older ones it depends upon mutual agreement.’ 
Conformity disappears as the child grows older, due to wider social 
experience. Piaget attributes this gradual displacement of constraint 
by cooperation to a progressive emancipation of the child from adult 
supervision. The fact seems to be, however, that adult restraint in- 
creases rather than decreases with age. ‘There are more ‘‘dos” and 
“donts’”’ meted out to the child of six than to the child of three, and 
they come from more sources. Thus the child is not increasingly 
emancipated from adult supervision, but increasingly subjected to it. 
This would seem to indicate that it is growth in consciousness of self- 
hood that accounts for the increased resistance to adult restraint, and 
not progressive elimination of such restraint. 

But whatever may be the causes in this progressive change in moral 
judgment from heteronomy to autonomy, Piaget’s data point to the 
danger of moral distortion when the moral restraint appropriate to 


ry " 





ne NO gy wn mene 
ee ee 
wal core 


<= we 


SSE as ee 


158 The Journal of Educational Psychology 


three-year olds is exercised upon older children in an arbitrary 
manner. Max ScHoeEn. 
The Carnegie Institute of Technology. 


Davin KENNEDY-FrRasrerR. Education of the Backward Child. New 
York: D. Appleton and Company, 1932. Pp. VIII + 235. 


Mr. Kennedy-Fraser not only claims to have written a book tuned 
to the needs of the classroom teacher, but he has actually done so. 
The language which he uses in ‘‘ Education of the Backward Child” is 
non-technical, and his philosophy understandable. He has achieved 
simplicity without patronage, exactness without pedantry. The 
volume is strongly recommended to the group for whom it is intended— 
teachers of backward children. 

The author, at present in charge of training teachers of mental 
defectives in Scotland, first makes it clear that he is dealing in this 
book with the two rather large groups of what he calls ‘more retarded” 
and “less retarded” children. The IQ range for the total number 
would be from fifty to eighty. According to his figures, such children 
constitute about ten per cent of the juvenile population. He considers 
some of the administrative procedures applicable to the education of 
dull pupils, and then proceeds with a clear-cut and instructive discus- 
sion of teaching methods in such subject-matter fields as reading, 
spelling, and handwork. 

The book contains little, if anything, that is new; but in an educa- 
tional world that so often gets itself into difficulties with its muddled, 
emotional thinking about “‘democracy”’ and ‘‘the same opportunities 
for all,’ it is good to be reminded that ‘‘the purpose of educating the 
backward child is to fit him as well as possible for a useful and happy 
life in the community,” that ‘‘ we must seek to educate the backward 
child to the level of his abilities,’’ and that “‘we must train him in such 
a way that he feels, not thwarted and unhappy, but contented and 
eager.” HerBert A. CARROLL. 

University of Minnesota. 


Henry Dexter LEARNED. A Modern Introductory French Book. 
New York: Oxford University Press, 1932. Pp. XLIII + 317. 


Among the large number of beginning French texts on the market, 
Dr. Learned’s book represents something new and different. From 
the title to the last page of ‘‘ English-French Index’ the book remains 


tru 


has 
act 
the 
vor 
col 
au 
up 
col 


lat 
CO! 
th 








ry 


ew 


ed 
’ is 


red 
“he 


tal 
his 
d 9) 
ber 
ren 
ers 
_ of 
us- 


ng, 
ca- 


ies 
the 
py 
ard 
ich 
ind 


ok. 


et, 
om 
ins 





Book Reviews 159 


true to its purpose. It claims to be ‘‘a contribution to modern 
language teaching,” a textbook defending no theory or method, which 
has grown slowly out of fresh experiments in the laboratory of the 
actual classroom covering a period of ten years. Account is taken of 
the latest researches in the field of modern language teaching. As to 
vocabulary, the author has profited by the latest word frequency 
counts of Henmon, Thorndike, Vander Beke, and Cheydleur. The 
author’s aim was to include only the most necessary words which make 
up a net active list of between six and seven hundred words, ‘‘the 
commonest in the language.” 

To emancipate the student of “‘slavery to the dictionary,” the 
latter is to get his first contacts with a French word wholly from the 
connected discourse at the head of each of the forty-two lessons in 
the book. Since the author believes that many teachers cannot arouse 
much enthusiasm for ‘‘chairs, chalk, and blackboards, or even for 
unseen squirrels and apple trees” nor for “‘imaginary travel experi- 
ences,” he has chosen Maupassant’s La Parure as the basic French text 
for lessons I to XXVI inclusive, and Erckmann and Chatrian’s L’ Ami 
Fritz for the remaining sixteen lessons. This gives passages of excel- 
lent content and brings relief from the monotony of far too many 
beginning French books. Special vocabularies pertaining to the 
home, school, parts of the body, clothing, and food, and also a passage 
of connected discourse, ‘‘En Voyage,’ are included in the appendix. 
Of the active vocabulary the most common words occur more than 
twenty-five times and the least common at least three different times 
in a French context at fairly regular intervals, the author states. The 
usual French-English and English-French vocabularies at the end of 
beginning books are replaced by a ‘French-English Vocabulary and 
Index” and an ‘“‘English-French Vocabulary and Index.” In most 
cases the words are not translated, but reference is made to the lesson 
vocabularies and to the French text where they occur. Pedagogically 
the principle is sound, but for practical purposes the method is some- 
what cumbersome. 

Pages xxiii to xliii give a discussion of phonetics and phonetical 
transcription as applied to the students’ own language and to the 
French. The transcription using the symbols of the Association 
Phonétique Internationale is employed in the first ten lessons. The 
author’s statement that the “pronunciation of a foreign language in 
another country can be intelligently and successfully approached only 
through a study of phonetics” (p. ix) is open to criticism. Moreover, 


~ gg ene 


er a gs 








160 The Journal of Educational Psychology 


not all teachers have complete faith in the efficacy of phonetic tran- 
scription in teaching a beginning course in foreign languages. 

Each lesson begins with an extract of connected French discourse 
followed by: (1) French questions on this extract, (2) French sentences 
with certain untranslated English words to be translated, (3) English 
sentences to be rendered into French, (4) French idioms to be memor- 
ized and (5) grammatical material. At rather irregular intervals 
provision is made for a review of vocabulary and grammar. These 
exercises are somewhat brief and inadequate. Pages 185 to 217 
contain very good supplementary exercises pertaining to lessons 
X to XLII which might be improved by using less disconnected dis- 
course in the illustrative sentences. 

Different styles of type are used in presenting the idioms and 
grammatical material to call attention to particular difficulties and 
important points. The graphic manner in which the discussions of 
grammar are presented appeals to the eye, saves time, and assists 
memory. Dr. Learned’s idea is to forestall mistakes as far as possible, 
which is excellent pedagogy. Passages in point are, for example, the 
presentation of the partitive on pages fifteen to seventeen, the agree- 
ment of the past participle on pages sixty-five, and the position of 
personal pronoun objects on pages ninety-six to ninety-nine. 

The author’s treatment of irregular verbs brings in new ideas and 
is a real contribution towards solving the problem of learning them. 
The reviewer is, however, rather sceptical about the author’s state- 
ment (p. ix) that all the irregular verbs can be mastered in the first 
year by means of his scientific classification. Only students of unusual 
talent would ordinarily accomplish such a feat. 

The occasional historical explanations of phonetic and orthographic 
difficulties are interesting and helpful. They add to the content of the 
course and will appeal especially to the able student. The original 
pen sketches by the author also make the book more attractive. 

On the whole, the book is an excellent piece of work incorporating 
many new ideas in the technique of teaching beginning French, and 
will be welcomed by progressive teachers generally. 

WituiaM F. KamMan. 


Carnegie Institute of Technology. 








““ YW 


am Oo @ 





== 


~~, Pe 


ere Se eee ae 





